Security Assertion Markup Language
Why write another article on SAML? There are many articles you can find on the internet regarding SAML but they all seem to suffer from some significant problem.
- They are written within the context of some application, and so, don't show the actual workings, but rather focus on how to configure that application.
- They tend to stop describing the protocol when the author uses some phrase that is assumed to be understood. The author might say 'the Principal does a Redirect or a HTTP POST'.
In any event I've found the articles online to be inadequate and had to resort to reading the specification documents; which can be brutal. They assume a great deal of founding knowledge and they are not pedagogical in any way. So here is my attempt at a decent pedagogical introduction to SAML for a software developer.
The Point of SAML
Single Sign On (SSO) is the aim.
Security is summarized under the three As; Authentication, Authorization and Accounting. The first A is Authentication where the Principal, the user, uses his credentials, his password, to identify himself. The second A is Authorization. This deals with what rights the Principle has. What is he allowed to do? The Principle is typically assigned a number of Roles, which are groups of Privileges. So an Administrator Role is allowed to do a great deal and anyone with that Role is an Administrator. A User Role on the other hand has fewer rights or privileges. The final A is Accounting and deals with auditing and keeping a record of a user's actions. This gives us Non-repudiation where user's cannot deny what actions they have performed.
In many articles on the subject, I've heard of the 3rd A being Access Control rather than Accounting. Access Control is the actual enforcement of the Authorisation rules. I don't like this definition, as it's implied under Authorisation. In other articles, Accounting is the third which is a distinctly different aspect of security and is worth including.
SAML is primarily concerned with Authentication of the Single-Sign-On variety. So why have Single-Sign-On? Well, the first reason most often given is convenience. A user, once authenticated, does not need to authenticate again for as long as his session is active. But there is a more significant reason.
SSO implies a single Identity Provider that manages the user's identity and the user authenticates once against one system in order to access many different systems. This is nice because when each application has to implement their own identity management system, they each end up having to ask their users to create new passwords. No-one remembers all these passwords. So either the user keeps forgetting his or her password and then has to execute password recovery procedures, which usually makes use of emails that aren't very secure, or he or she uses the same password everywhere. Since the same identity management feature needs to be implemented many times, the probability of an incorrect and vulnerable implementation necessarily goes up. This is basic probability. Eventually one of those systems is compromised and now the hacker has the user's password; the same password the user is using everywhere. Now with a little bit of guessing to establish the user's username, the hacker has access to the user's account on other systems that were previously secure. With SSO, only one application needs to get this right. Each Service Provider needs to enforce Authorisation, but if there is some problem, only that application is affected. So SSO is about more than just convenience.
A Tale of Two Services
SSO requires a protocol to be established between two services and the Principal. The first service is the Service Provider, which is offering the access to some resource that belongs to you, the Principal. This could be your email, your photos, your online banking, your social media website, or anything really. The second, is the Identity Provider. The Identity Provider managers your registration and your credentials. In recent times, Facebook have started fulfilling this role. When you want access to something and the Facebook page loads and askes you to sign in, this is Facebook acting as the Identity Provider.
So what do I mean by protocol? A protocol is a series of interactions whose form has been previously agreed. We have the two services, the Identity Provider and the Service Provider and we have the Principal, which is your web browser acting as you.
There are two general ways of implementing the SSO protocol which are common to SAML and OpenId and other implementations. Here is the first which I will refer to as Protocol One.
- The Principal makes a request for a resource.
- The SP identifies that the Principal is not authenticated.
- The SP directs the Principal to the IP. (If you are like me, you are now crying out for the actual details on how this takes place. I will get to that in the next chapter.)
- The IP authenticates the user and then returns a Certificate1 This process of authentication might involve a series of communications with the Principal, but they don't form a part of the SSO protocol. The IP is free to define this.
- The Principal hands the Certificate over to the SP.
- The SP validates the Certificate using a shared secret, or the PKI2 system.
The most significant feature of this protocol is that the SP and IP don't communicate directly, but rather there is some system to verify the Certificate; either a shared secret between the IP and SP or the PKI system. The principal has at some point taken custody of the artefact that declares his authenticity.
1 A Certificate is a document that has been digitially signed. Its voracity can be verified by some cryptographic process. Consequently what is states must be true, and what it will state in this instance is the identity of the Principal.
2 The PKI system is the Public Key Infrastructure system, that is ubiquitous on the internet. It's a system of public-private key-pairs that websites on the internet use to identify themselves, to encrypt stuff, and to sign stuff. The basic principle is that something encrypted with the public key can only be decrypted with the private key, and vice-versa. Something encrypted with the private key, can only be decrypted with the public key. Using this system, it's possible to send someone a private message without establishing a shared secret beforehand. And it's possible, for someone to sign something and publish that to the world, and anyone can verify the signature, using the appropriate public key. Your browser uses this system everytime it uses HTTPS.
The second protocol, Protocol Two, is thus:
- The Principal makes a request for a resource.
- The SP identifies that the Principal is not authenticated.
- The SP directs the Principal to the IP.
- The IP authenticates the Principal and then returns a token3. Again, the IP is free to define exactly how it authenticates the Principal.
- The Principal hands the token over to the SP.
- The SP calls the IP and hands over the token.
- The IP responds with the results.
Using this system, the Principal is never entrusted with custody of anything aside from a token. Instead, the SP communicates directly with the IP and establishes the Principals bonafides by redeeming the token.
3 A Token is a very big random (hard to guess) number.
In either system, the token or the certificate represents the user's session. This concept of a session implies that the user has access for a limited time period. There are two reasons for this. The first reason is that any cryptographic process can be hacked with sufficient computing power. But this isn't really a practical concern. For example if you look at the expiry dates on CA root certificates you will find that they tend to expire in 20 years time. This is a measure of the confidence in how long it would take to hack one. The more practical reason is that the user's profile may change, their rights may be revoked, or their account might become locked. In general it's only when the user is authenticated that these details are checked.
This leads to a significant feature of Protocol One. The session cannot be renewed without directing the user back to the IP and this is done on a regular basis. Depending on the capabilities of the user's browser, this can be a disruptive experience. Protocol Two on the other hand can seemlessly renew the user's session, but it depends on the implementation.
But I'm getting off the subject of SAML here in a big way. The primary point to take away is that if someone else gets the token, or the certificate, they have usurped your session and so they can do whatever you can do in your name. This is why SAML requires a secure communications channel. Typically HTTPS is always used.
So each protocol involves a series of calls which I will name Legs. Leg A occurs when the SP directs the Principal to the IP. Leg B occurs when the IP directs the Principal back to the SP. Leg C occurs when the IP talks to the SP directly or vice versa.
A Tale of Two Messages
Another way of looking at SAML is that it's really just composed of two messages. An Authnrequest which the SP provides to the IP, which I will refer to as the Request.
And a response which the IP returns to the SP that identifies the user and makes some assertions about him.
And when the SP has received the response, it has the intrinsic property that it is trusted.
The trust in the response is where all the complication arises.
To be clear, I'm going to re-state the legs in the diagrams above:
- Leg A - the SP sends a message to the IP as a result of the Principal not being authenticated.
- Leg B - the IP sends a message to the SP after Leg A.
- Leg C - the IP and SP communicate directly either after Leg A or Leg B.
Now the difference between Protocol One and Protocol Two is really the nature of the response or payload in Leg B, and whether that is followed by a Leg C.
In Protocol One, the Response in Leg B is signed, so that the SP can check the validity of the response. This is required because the Principal has taken custody of the Response and could have altered it. In Protocol Two, the Principal does not take custody of the Response, but rather a Token. The Token is really just a very large random number. It cannot be guessed at easily. It identifies the Principal's login session on the IP, so if it is modified in any way, it becomes a number that identifies no session on the IP with a very high degree of probability. If this worries you, it really shouldn't. All server side session systems work by storing the session id in a cookie so that the requests from the browser can be tied to the data in the user's session on the server. This is the same system. In Protocol Two, the Token is then submitted directly to the IP by the SP, so the SP can trust the response. The same end goal is achieved, a Response is delivered to the SP that can be trusted.
So I have outlined how SAML can deliver the Response in two possible ways. It's actually true of SAML that you can do the same with the Request. Instead of the SP sending a Request to the SP via the Principal, the SP can instead send a Token, that the IP redeems against the SP for the Request. It's the same system or pattern, but the parties have been reversed. Other protocols don't have this facility, mostly because it's not really needed, but I will get into that in a bit.
So how do the interactions actually occur?
SAML defines a number of Bindings. These Bindings are basically different ways of completing the different Legs above. Most documents on SAML seem to jump right in and describe the bindings, but end up giving the executive summary, and don't go into any great detail. So as promised, lets get into the details.
The first thing to understand is that in most cases, the Principal is a web browser. So the context is intrinsically HTTP based and web based. The reason I bring this up is because some understanding of web technologies is necessary. Feel free to skip the HTTP section if you are already familiar with this.
Some HTTP basics
The internet is built upon TCP/IP. This protocol establishes a connection between two parties. And either party can use that connection to send data to the other. Conceptually you can think of it as a pipe that you can throw text down and read text from. Either party can just send or read data in any order. There is no structure until you introduce an application level protocol like HTTP. This is necessary as two parties can't communicate until you've answered some basic questions like who gets to speak first and how does each party know the other party has finished saying something.
HTTP establishes the basic protocol that the caller sends a Request and the callee responds with a Response. This is the Request-Response paradigm. (Not the same Request/Response I've outlined for SAML. These are completely different.) Something else to bear in mind is that the original HTTP protocol was all about serving static documents. Things like Logins and real time data or actions weren't a part of the design way back then, so the language talks in terms of static documents. The basic structure of HTTP is like so:
There is an initial line which specifies the general character of the Request. If we are asking for a document from the web-server, this will be a GET line. We also specify which resource is of interest in this line. Then the following lines are called the Request Headers and there are many of them and they are basically just name value pairs. The header section is ended with a double carriage return.
The callee responds with the Response which includes a body. The Content-Length header tells how long the body is. The body follows the header section after a double carraige return.
So that's a GET request. This is the most common request made on the internet and your browser does this all the time when you enter a website address in the address line and a web page loads. The browser makes the GET request and receives a response which has the web page in the response body.
Now HTTP doesn't always behave in this way. The designers of HTTP incorporated into the protocol a way to tell the caller that they have asked for a resource that has moved and a way to indicate where the resource has moved to. This is what is called a HTTP redirect. SAML makes use of this part of the protocol to direct the Principal to the IP. In this case, the SP returns with this response after the GET request above.
When the browser receives this 302 code back from the server it knows that it should make a GET request to the address in the Location header.
So far I've outlined how a GET works and how a Redirect works. This system allows for the server to send back large amounts of data to the caller in the body of the response. There is another call that the caller can make that allows for the caller to send large amounts of data to the server. This is a POST call. This is typically what happens when a website submits a form.
So why am I going into HTTP at this low a level? Because it explains why SAML provides different bindings. It's all about the limitations of the Principal (the web browser most often) and the limits to data size in various parts of the HTTP protocol. A HTTP header is most often restricted in size to 4096 bytes (sometimes smaller). The first line of the HTTP request can be a good bit larger. The body of the request or response can be theoretically unlimited in size.
SAML Redirect Binding
So SAML can use the Redirect Binding for either Leg A, where the SP directs the Principal to the IP, or Leg B, where the IP directs the Principal back to the SP.
This is the simplest and most elegant binding, but it suffers one major drawback and that is the size of the payload. The payload in this case is entirely included in the redirect url which is included in the Location header during the redirect. This Location header is limited to 4096 bytes. Here follows an example.
As you can see from the Location header, there are a number of variables that are passed; in this case SAMLRequest, RelayState and Signature. This is therefore a SAML request on Leg A. An Authnrequest document in XML has to be base64 encoded and then included as the value of the parameter. Base64 encoding automatically increases the size of the contents by a third. So this value gets big quickly. It's possible to do this with the Request, but the Response is a much larger document.
The browser receives this Redirect response and then immediately follows with a GET request on the url specified in the Location header.
The user's experience here is the best. He opens a web page on the SP, the funky flying chicken. His browser automagically redirects to the IP which services up a login page. Great. He logs in, and is immediately forwarded to the flying chickens on the SP (redirecting again).
The same binding can be used for the SAMLResponse from the IP to the SP. (https://docs.oasis-open.org/security/saml/v2.0/saml-bindings-2.0-os.pdf)
In this case the name of the query parameter has changed to SAMLResponse. The redirect binding response is very rarely used as the response document is very commonly much larger than the request document.
You might be wondering about the other parameters.
RelayState is merely a useful value that the SP can ask to be carried between Request and Response and is most often used to tie the user's session together between the two legs.
SAML Post Binding
This works for both Leg A and Leg B, and for the SAMLRequest and SAMLResponse respectively.
There is a system in SAML where the IP and SP communicate directly. In this case signatures are not required, because the SPs and IPs communicate independently of the Principal. This is the bit where Leg C is used. Lets say that the IP responds with an artefact id, a token, after the Principal authenticates. The principal is redirected back to the SP. Leg B. This is how that would look.
The SP receives the SAMLart values a a request parameter in a GET call made by the browser. It then makes a direct call on the IP using SOAP. This is Leg C. (SOAP is a server to server protocol based on XML messages. There is no browser and no human beaing involved so the messages are structurally rigorous.)
The SP makes a direct call on the IP.
The same general form can be used for Leg A as well.
So all of this so far is concerned with a user using a web browser. Its also possible that a server process would play the part of the Principal and would do something. This is now a situation where there is no web browser so HTML Form Post and the like cannot be used.
I'm not going to elaborate on this, but SAML provides a machine to machine binding using SOAP for this purpose.
Notice the use of signatures everywhere when the Redirect and Post Bindings are used. Signatures provide a guarantee that the principal hasn't modified the packet or payload. In the case of both bindings, the Principal or web browser actually takes possession of a 'Certificate' from the Identity Provider. Possession of this Certificate is sufficient proof to any Service Provider that the Principal is who he says he is.
There needs to be some system that guarantees that the Principal hasn't modified the Certificate.
This is where the Signature comes in. The signing process takes advantage of the Public Key Infrastructure system. This system allows for each party who wants to sign something, to register a public-private key-pair. The public key can then be made freely available. The signer can then use the private key to sign anything, and anyone can then assert that the signer did in fact sign by using the public key to verify.
As a part of this process of verification, the validation process will identify the party that made the signature.
In the examples, Leg A sometimes include a Signature as well. This isn't present for precisely the same reason. Here it is optional. The presence of a Request does not mean that the Principal has achieved some authenticated state. Because the signer of the Request is identified as a part of the verification process, the IP can assert that the Principal has been redirected from a known and valid Service Provider. The SP is the party with the sensitive resource however, so a fake SP doesn't really make sense. So the need to assert that the SP is true isn't really that necessary from the IPs perspective. From the Principal's perspective it is important however. This scenario is known as Spoofing. However the connection the Principal has with the SP is over HTTPS, and HTTPS includes its own system for verifying that the other party in any HTTPS connection is who they say they are.
So if Protocol One is used, then Leg B absolutely requires a Signature. And HTTPS must be used for Leg A and B.
One of the biggest things that many people don't like about SAML is the increased complexity. Much can be achieved with simpler protocols like OpenID. And it's particularly when you get into the nitty gritty of signatures that this becomes apparent.
Any signature system operates at the byte level. If you sign a byte stream, then the signature can be used to verify that the byte stream is valid. The problem with XML is that there are many ways to structurally specify the same document and that means there are many different ways to map that same information in XML to different byte streams. Consider these two ways of defining XML that are semantically equivalent.
There are other problems as well, such as whitespace, the ordering of elements and namespaces.
In order to solve this problem, SAML makes use of the concept of External Canonicalization. This defines a canonical form for XML that is unambiguous. This form is then mapped to a byte stream and signed.
Other frameworks involving JSON (OpenID), don't bother with namespaces, so don't run into those issues, and they simply sign the current representation of JSON as-is.
- Oasis SAML v2 - https://www.oasis-open.org/standards#samlv2.0
- Oasis SAML documentation - https://docs.oasis-open.org/security/saml/v2.0/