Skip to content
English
On this page

Identity Management

Let’s start with Identity Management (IDM). This term is sometimes used to summarize the identity service holistically. However, professionals in the industry mean only one thing when they use the term IDM: managing how systems are kept in sync when information about a person changes. One of the most important IDM use cases is “provisioning” and “de-provisioning”. Identity management events happen when a person’s record is created, updated, or deleted. For example, when a person is hired by an organization, this may trigger a workflow where accounts are created in several systems, approvals may be needed, and equipment may be issued. Conversely, when a person leaves the organization, accounts need to be removed or inactivated. When a person’s information is updated, this may also trigger a workflow. For example, if a person’s role in the organization changes, access to systems may be granted or revoked. Changing your password is also an example of an update. For this reason, many IDM systems include a self-service password reset website as part of their solution.

IAM is a “consumer” of information managed by the IDM system—meaning the IAM system expects data about a person to be already present and accurate. It’s garbage in, garbage out. The access management is only as good as the quality of the underlying data. If you fire an employee, but never take him out of the database, it doesn’t matter what kind of fancy authentication technology you use! The lines between IAM and IDM can get blurred. You can have IAM features in your IDM. For example, two-factor authentication account recovery—you need to be strongly authenticated before you can reset a strong credential. You can also have IDM features in your IAM. For example, social login, where users are added on-the-fly the first time they are authenticated. Another example is forcing people to reset their passwords during a login flow.

Organizations have two options for IDM: buy or build. Many websites implement simple registration and account management—adding, editing, and removing records about people is handled by custom code. In larger organizations, where there are more systems and the business rules are more complex, an IDM platform may be more productive. There is some excellent FOSS IDM software out there. Although this book is primarily about IAM, IDM is covered at a very.high level

Identity and Access Management

What is Identity and Access Management? A little history will help give you some perspective. The original Internet IAM infrastructure was based on the RADIUS protocol. If you are old enough, think back to the days of modems (or if you’re not, think of a VPN or WiFi connection, which still use RADIUS). These systems have three parts: (1) a RADIUS client requesting access to the network, (2) a network device that has modem ports or some other network resource, and (3) the RADIUS Server that provides the AAA— authentication, authorization, and accounting. RADIUS was developed by Livingston Enterprises Inc. (now part of Alcatel Lucent) to control access to “terminal servers”—network devices that had a high concentration of modems. It later became an IETF standard. Today, the last A in “Triple-A” (“accounting”) has dropped off from most modern IAM systems. In the old days, you might only have a certain number of hours of dial-up, and the RADIUS Servers would interface to the billing system of an Internet Service Provider (“ISP”). After authenticating the person, the RADIUS Server would authorize, for example, to allow either one or two channels, depending on which type of account the person had purchased. This is a simple example of the authorization capabilities of RADIUS. Fast forward a few years. The next phase of Internet IAM started to take place when the World Wide Web achieved critical scale. Believe it or not, a ubiquitous web was not a forgone conclusion. By 1998 a company called Netegrity (purchased by Computer Associates) launched a product called SiteMinder. This was a new kind of AA server designed to control access to websites instead of network devices. The design was similar to RADIUS. There were still three parts: (1) a person using a web browser (the client), (2) a web server with the SiteMinder Agent installed, and (3) the central SiteMinder Policy Server. In the Policy Server you could make policies about which people could access which web servers. A new advantage of web AA platforms like SiteMinder was that you could achieve Single Sign-On (SSO). In other words, the person could authenticate once and access multiple websites in the same domain. More generically, this pattern is commonly known as the “PDP-PEP Pattern.” And there are a few other standard parts.

Security

Here is a brief summary of the components:

  • PDP “Policy Decision Point”—Knows the policies which people, using which clients are allowed to access what resources. In a way, it’s the brain of the system.

  • PEP “Policy Enforcement Point”—Is responsible for querying the PDP to find out if access should be granted. There are usually many PEPs. For example, there could be hundreds of web servers in an organization, each relying on the PDP to grant access.

  • PAP “Policy Administration Point”—This is some kind of interface that enables an administrator to define the policies in the PDP. It could be a website or command-line interface. Without a PAP, administrators are forced to manage the policy data in a database, or by configuring files.

  • PIP—Policy and user information is persisted somewhere, normally in a database like an LDAP Server.

Identity and Access Governance

Identity and Access Governance (IAG) is the process of decision making and the process by which decisions are implemented (or not implemented). Identity governance is not entirely a technical challenge. It is a combination of systems, rules, and procedures that are defined between an individual and an organization regarding the entitlement, use, and protection of personal information in order to authenticate individual identities and provide authorizations and privileges within or across systems and enterprise boundaries.

A governance-based approach answers three important questions: (1) Who does? (2) Who should? and (3) Who did?. “Who does?” addresses reality: you need to have an inventory of the security process in your organization and compliance practices that are in place. “Who should?” is the process of mapping roles to resources, setting policies, and affecting automation to efficiently affect these decisions. “Who did?” requires monitoring and audit and involves activity collection, review, and alerting.

Security

IAG encompasses the totality of the relationship between the organization and all the digital resources of the company. You can think of the IAG system as the brain, while the IDM and IAM system are the body. Simply put, in IAM, we assume you already know what policies you want to implement! In IDM, we assume you already know which users you want to push to what systems.

Governance happens whether or not you have an IAG platform. In order to respond quickly, governance tools frequently provide convenient graphical user interfaces to increase productivity and to reduce incident response times. There are few open source governance tools. Evolveum Midpoint, has some governance features. It’s a new area of development, with commercial software solutions arising within the last eight to ten years, and few open standards to implement.

Directory Services

The substrate of IAM is data; all that data about people and their privileges has to be persisted and retrieved. Technically, any database will do. And in fact, different solutions use different databases to solve specific requirements. However, many access management solutions use the Lightweight Directory Access Protocol (LDAP) to frontend identity systems.

Historically, LDAP databases were faster at retrieving simple data than relational database management systems (RDBMS). This may or may not be true anymore— properly indexed, many databases could be made fast enough. LDAP also has strong data replication features, which is important for large-scale identity systems. And finally, many LDAP implementations are able to enforce fine-grained access to the data— defining policies in a tree structure is easier because you can make rules about all data that resides “below” a certain node in the tree. Many people think of LDAP as being fast to read and slow to write. This is not always true anymore. Many LDAP Servers have write performance on par with other database technologies.

No matter what persistence strategy you are using for the directory service, it’s a critical part of the identity platform. Configured incorrectly, it will inhibit all three components we have discussed so far. Also, if your directory service is really big— millions, tens of millions, hundreds of millions—you really need to think about the persistence layer. You should always benchmark performance with requirements similar than what you expect. One of the black arts of LDAP is optimization for a certain data or transaction profile.

Identity Standards

Identity services have to play nice with a diverse IT infrastructure. For this reason, open standards for identity have become increasingly important. There are several standards bodies in the identity ecosystem. The ones this book will focus on are the IETF, OASIS, Kantara, and the OpenID Foundation. However, many identity standards are built on standards governed by other standards bodies. For example, X.509, a standard for digital certificates, was developed by the ITU. There are two types of standards: “build it and they will come” standards, and “let’s work together so we don’t all do something different” standards. The most successful standards typically fall into the latter category, but in the identity space, without some of the former, some of the latter would not exist. This book will cover old and new identity standards, in order of appearance in this book: LDAP, SAML, OAuth, OpenID Connect, and UMA.

LDAP is the oldest identity standard. Completed in the ’90s, it has been the core competency of identity experts across the globe since that time. The standard includes a communication protocol between clients (who want information) and servers (who hold information). It also includes standards about the format of data, how data can be serialized in a text format easy for humans to read and write (called LDIF), and other conventions that improve interoperability between the clients and servers of various implementations. SAML is one of the most important web-based federated identity standards. It’s the most widely supported standard by SaaS providers who want to accept credentials from large enterprise customers. It uses XML as the data format, which has become somewhat problematic, as parsing XML documents has been fraught with risk (there are a lot of places you can go wrong). Like most other federated identity standards, it is based on redirect a person’s browser to a website maintained by their home organization. Assuming the website is trusted (and how that occurs was quite innovative), the home organization then returns information about the person to the original website. It’s quite a big standard, and this book will cover only its most widely used features.

OAuth 2.0 is still under active development. It uses JSON as the data format, and RESTful APIs to enable a person (or organization) to authorize access to resources. Loosely based on a previous protocol by Facebook and the experiences of Microsoft and Google, it was initially hashed out at the Internet Identity Workshop in Mountain View, California. OAuth is a delegated authorization protocol, not an authentication protocol. You’ve used OAuth if you’re used Google login at a third-party site and approved the release of information. OpenID Connect is the most prevalent profile of OAuth. In this protocol, you can authorize the release of information about yourself to a website or mobile application. The previously-mentioned Google login example is actually OpenID Connect. Google has no idea if it should release information about you to this website. Only you know if you want that, so why not just ask you? OpenID Connect is a collaboration of Google, Microsoft, and other large companies and a few smaller contributors. Google authentication and Microsoft Azure authentication is OpenID Connect. Many organizations are adopting the standard. Although similar in purpose to SAML, it offers a more modern API design and better support for mobile device authentication. The User Managed Protocol (UMA) is another profile of OAuth. It offers a flexible protocol to enable three parties to collaborate on security: the Resource Server (which publishes the APIs), the Authorization Server (which issues tokens that grant access to APIs), and the Client (which is the website or mobile calling the API, sometimes on behalf of a person). UMA also defines a protocol to enable the Resource Server to register its protected URLs with the Authorization Server. Using UMA, organizations can implement a PEP/PDP access management infrastructure.

Gluu Server

At the center of our IAM narrative is the Gluu Server, which includes free open source identity components, integrated together in several easy to install and configure distributions. Gluu’s founder is Mike Schwartz, one of the authors of this book. The Gluu Server includes a SAML IDP, an OAuth Authorization Server (supporting also OpenID and UMA), a component to handle social login, an LDAP Server, and an administrative web interface. Gluu is committed to keeping the Gluu Server free. That means the code is available on GitHub, the binary packages are published for Linux, Docker, and Kubernetes, the documentation is available, and your questions will be answered on the community support forums.

The goal of the Gluu Server is to be the best free open source IAM platform and to have the lowest total cost of operation (TCO). This has been done by incorporating good existing open source components where they exist, and by writing software to fill in the gaps. By not writing 100% of the platform, Gluu has been able to deliver one of the most innovative platforms on the market.

Why Free Open Source?

Why base your organization’s IAM infrastructure on free open source software? The cost of commercial IAM software is prohibitive to many organizations. Many of you reading this book are looking for lower cost alternatives. There is a saying that FOSS is only free if you don’t value your time, since it sometimes requires more time and effort to implement than commercial alternatives. But even if nothing is truly “free,” FOSS is less expensive. Saving money is always good, right? But why should you use FOSS if cost is not an issue? IAM systems are missioncritical, not only to the security of an organization, but also to the availability of its digital services. Most organizations are happy to pay money for the best technology if it gives them a competitive advantage or mitigates risk.

And interestingly, here’s where the reasons for FOSS get even more compelling. Jim Whitehurst, CEO of Red Hat, has asserted that FOSS is the best development methodology—that it results in the best available software. Research in 2014 showed that open source software had 0.59 defects per 1,000 lines of code, while commercial code had 0.72!1 But FOSS software has also proven to be very innovative—with fast release cycles. FOSS has been particularly successful at implementing Internet standard protocols. As of July 2018, more than 62% of the top million busiest sites ran the Apache or Nginx web server.2 The services we enjoy from Google, Apple, Dropbox, and many software as a service (SaaS) companies could not exist without FOSS. This is even more true when you consider that most of these services are running on the Linux operating system.

Another reason to use FOSS is because there are more people who can use the software. It is easier for beginners to get hands-on experience with FOSS, which translates to more people getting trained. This means organizations can find more candidates, whether recruiting an initial team or replacing members of an existing team. Publicly searchable support is another reason many prefer FOSS. Would you rather Google a question or open a support ticket with a vendor? FOSS communities offer an alternative to support from a vendor. And as a last resort, you can always look at the code. Developers are used to this process and are frustrated when commercial support is the only option, which frequently leads to less content. If you pay a lot for one of the many expensive commercial offerings, won’t that save your job if something goes wrong? You can say you advised the purchase of the best software the market had to offer. Your company can sue the commercial company vendor if there is a problem. But all open source licenses prevent you from suing. You get what you pay for, so FOSS IAM platforms must be worse—why else would people buy these expensive commercial platforms? In practice, suing a software vendor is a joyless, unproductive, and unpredictable way to recover wasted time and money. IT failures occur for complex reasons—assigning blame to the vendor is usually difficult.

Another factor to consider is reusability. People move from one organization to the next with surprising frequency these days. Will you be able to bring your tools with you to the next gig? If you master FOSS tools, the chances are good. If your tools are dependent on a large financial commitment and a long, drawn-out legal process, probably not. Some sage advice: being great at your job is a much better plan than to “not get fired” when something goes wrong, and it’s a lot more fun! The best reason to use FOSS is because it’s the best software available. Don’t make decisions based on fear. “You get what you pay for” is not always true anymore. Every one of your organization’s digital services hangs off the identity system. The ability of your organization to meet the demands of the market is intertwined today with the IT infrastructure. You have a critical contribution to make. Be a champion of open source software at your organization because it gives you the best chance to succeed in the long term!

LDAP

Directory services are a critical part of your identity infrastructure. Many components in the identity stack need to either read or write data. While any database could work, a popular choice for many identity projects is LDAP. This chapter is not a comprehensive guide to LDAP. If you are deploying LDAP in your environment, study the documentation for your LDAP Server of choice. Like other chapters in this book, the goal here is to give an overview of the technology and brief descriptions of some open source software tools.

History

You can retrieve data stored in a tree format quickly by ignoring data that is not relevant to your search. For example, think about your family tree. If you want to know the descendants of your parents, you can ignore all the family tree beneath aunts or uncles. Reducing the scope of the search saves a lot of time. In the late ’80s, the X.500 set of standards, developed by the International Telecommunication Union (ITU), standardized the storage and retrieval of data using a tree structure. X.500 infrastructure worked well for early messaging systems. However, the popularity of Internet Protocol (IP) lead to the need for a new directory access protocol. Collaborators from the industry released several related standards from 1993–1997 via the Internet Engineering Task Force (IETF), independent of any X.500 dependencies. The LDAP IETF RFCs do not define a persistence mechanism. Nothing prohibits an in-memory LDAP implementation, or even a persistence mechanism based on homing pigeons (although you’d have to build many lofts, and the performance would be terrible). Today, each LDAP vendor has its own persistence strategy, and there is a diverse range of technologies. When you choose an LDAP platform, you always need to consider the underlying database with its respective tradeoffs. One of the most popular databases for LDAP is Oracle Berkeley DB. OpenLDAP uses LMDB. IBM’s LDAP Server uses the DB2 database. Radiant Logic offers a commercial LDAP Server that uses Hadoop as the backend. Where did the LDAP Servers of today come from? It may help to understand that some LDAP implementations are related. In 1993, at the University of Michigan, Tim Howes wrote the first LDAP Server. In 1996, Netscape forked the project and launched the Netscape Directory Server, which became one of the leading commercial servers. The OpenLDAP project started, also by forking the 1996 version of the University of Michigan LDAP Server code—today the University of Michigan directory project redirects to OpenLDAP. In 1999, Sun Microsystems and Netscape formed an alliance, called iPlanet, re-branding Netscape Directory Server. In an interesting twist, AOL became part of this partnership when it acquired Netscape. In 2002, the iPlanet alliance ended, but the parties retained the right to use the LDAP Server code. Sun rebranded the LDAP Servers as the Sun Directory Server Enterprise Edition. In 2004, AOL sold the LDAP Server code to Red Hat, who open sourced it as the Fedora Directory Server (FDS). Sun continued to innovate the original Netscape Directory Server. But in 2005, some of the developers felt that they had gone as far as they could with the old code. Developers proposed to rewrite the LDAP Server from the ground up in Java. Refactoring and innovations in Java and persistence libraries would improve performance and make it easier to manage. The launch of OpenDS, a new Java LDAP Server platform, aligned with a short-lived open source movement at Sun. OpenDS used the same schema and access control mechanism as previous versions of Sun Directory Server, so the servers were relatively compatible.

And then Oracle bought Sun. There was overlap in the identity products. Regulators fussed, and Oracle agreed to divest some of the business. Some of the former Sun identity team raised money and acquired some of the technology—including OpenDS, which by this time was an essential part of the identity platform. Forgerock rebranded this new Java LDAP Server as “OpenDJ” and continues to release source code from time to time under the Common Development and Distribution License (CDDL). One of the benefits of LDAP is that because your application uses a standard protocol to access data, you are not locked into one vendor’s implementation. However, this does not mean that there are no switching costs with regard to managing the server. One of the interesting results from this long, intertwined history is that some LDAP Servers share some important management conventions—particularly how schema and access controls are managed. For example, the same schema can be used for OpenDJ, Fedora Directory Server, and even commercial LDAP Servers from Oracle and Ping Identity.

No discussion of LDAP would be complete without mentioning Microsoft Active Directory (AD), one of the most widely deployed servers with an LDAP interface. During the development of Windows 2000, Microsoft recognized that the flat user management strategy from Windows NT 4.0 was not sufficient to serve large enterprise customers. Developers forked the directory component from the Exchange 4.0 email server and added many features. Since its first release in 2000 as an official part of the Windows Server platform, AD became one of the most common directory servers for organizations. For many identity and access management deployments, AD is an important source for information about people.

Why Use LDAP Today?

Today, some people wonder about the relevance of LDAP. It’s not a protocol you want to use over the Internet to retrieve your cloud email or to access your cloud files. Even Microsoft’s new cloud identity service, Azure Active Directory, does not support LDAP connections from the Internet. Some would even argue that the hierarchical representation of identity should give way to a linked data graph model, where people are interconnected, not subordinate within an organization. The right persistence strategy depends on many factors. How big is your data— thousands or millions of entries? What is the concurrency requirement? Are read or write operations more common? Is multi-data center replication required? Does the concurrency warrant database shards? Can you use the database interface for in-memory cache? There is no “correct” answer—if data gets stored on the disk, any database can work for identity services.

But for many organizations, LDAP has proven to be a nice choice for the database in the IAM stack. Here are my top 10 reasons:

  1. LDAP helps you avoid lock-in to one implementation. LDAP has a text-based format called LDIF—LDAP Data Interchange Format so you can always export data from one LDAP Server and import it into another.

  2. There are many free open source libraries and tools to manage data using LDAP.

  3. Replication technology is mature for several LDAP Servers. For identity data, business continuity is critical. Many organizations want to know that a full set of data is available in two locations. While several other database technologies include replication, some implementations are “best efforts,” which means you may need to compare data sets periodically to make sure the data sets are still in sync. In other database implementations, replication is not available in the free open source packages.

  4. Many LDAP Servers support numerous algorithms to hash passwords and provide an easy interface for password verification.

  5. Tools exist to generate large LDAP sample data sets and benchmark performance—ensuring that the database performs as expected.

  6. Search performance is excellent in LDAP. By reducing the scope of searches and properly indexing, lookups are fast!

  7. LDAP has excellent UNIX command-line tools that enable you to perform most of your day-to-day administrative work over a simple SSH connection.

  8. There are strategies to scale LDAP horizontally, putting more disks and reducing replication traffic.

  9. Binary and text backups ensure you never lose your data.

  10. Enterprise customers have successfully deployed and operated LDAP infrastructure for many years, proving its reliability.

Basics

LDAP is a client/server message-oriented protocol. The primary operations defined in LDAP enable the client to read or write data, “bind” (authenticate a requestor), and “abandon” (signal to the server to cancel an operation). The figure shows a typical sequence diagram of the LDAP protocol.

Security

LDAP is not a simple text-based protocol like HTTP. You won’t be able to compose messages on the fly—you’ll need the help of client software. LDAP on the wire uses a set of rules for encoding data structures called the Basic Encoding Rules (BER), which is actually a binary format for ASN.1. Binary encoding significantly improves performance, and high throughput has always been an important design consideration for LDAP. If there is a need for a new operation, LDAP defines a standard extension mechanism called “extended operations.” For example, the StartTLS operation enables a client to indicate that it wants to initiate an encrypted transport layer or perhaps to use cryptographic signatures, so the parties can validate that they trust each other. In addition to operations, clients can include LDAP “controls,” which can enable servers to implement extended behavior not specified in the core LDAP protocol. For example, the SimplePagedResultsControl, which is described in RFC 2696, enables a client to control the rate at which an LDAP Server returns the results of an LDAP search operation.

Entries, DNs, and RDNs

A unit of information in an LDAP tree is called an “entry”—think of it like a record in a relational database, or an object with properties (and no methods), like a Java bean. An LDAP directory is composed of many such entries, connected together to form a tree. DN stands for “Distinguished Name” and RDN stands for “Relative Distinguished Name”. The DN is the full address of a node in the LDAP tree. The RDN is the partial path of an entry relative to another entry. For example, we might have an entry with a DN of uid=foo,ou=people,o=acme. It is comprised of three RDNs. The DN of an entry must be unique in the tree, and it is how we refer to an entry. If you try to add another entry with the same DN, the LDAP Server will throw an error. Although you might have learned to leave a space after a comma in your typing class, don’t do this when you reference DNs. For example, uid=foo, ou=people, o=example.com, is an invalid DN! Also, you should avoid several special characters when you choose a DN: space, hash, comma, plus, double-quote, backslash, less-than, greater-than and semicolon. Technically you could use these characters if you escape them, but do yourself a favor and just avoid them when naming entries.

Namespace

LDAP is based on the idea of a tree data structure. The namespace, or directory information tree (DIT) is defined based on how we name each entry in the tree. Over the years, some common practices have arisen in LDAP namespaces used for enterprise identity and access management. Let’s just dive into an example.

Security

The first level is called the root node. It consists of one entry: dc=acme,dc=com. dc stands for domain component. It may seem confusing that this root node has two components. Shouldn’t dc=com be the root? Perhaps, but a root node can start from a sub-tree. Another convention you might see is to use an organization entry as the root, for example o=acme.com. You might like this convention because it’s less typing. The second level consists of two entries: ou=people and ou=groups. ou stands for organizational unit. It’s a common container used to group entries, similar conceptually to a file system folder. There normally isn’t much data in the ou, except for the name. The third level contains the leaf entries with the actual data. There are entries for two people and two groups. The DN for an entry can be known by starting from the entry in question and traversing up the tree until you hit the root, for example uid=foo,o u=people,dc=acme,dc=com.

SAML

By the late 1990s, people were starting to get tired of entering the same username and password on different websites. LDAP helped organizations implement “singlepassword,” but didn’t enable web “single sign-on” (SSO). While some vendors were offering solutions for web SSO, SAML—the Security Assertion Markup Language—emerged as one of the first standards to enable a person to authenticate once and access websites both inside and outside their organization. The use case of a person accessing websites outside their home domain came to be known as identity federation. And the protocols that enable this are known as federation protocols. Not surprisingly for technology from 2005, SAML is an XML standard. SAML was developed by a diverse group of interested parties—29 organizations and several individuals contributed to the SAML 2.0 core specification. The standard represents the confluence of several previous efforts to standardize a protocol for SSO, including SAML 1.1, Liberty Alliance ID-FF 1.2 and Shibboleth 1.3. All of these previous standards should be avoided.

Like LDAP, SAML is not defined in one document, but a number of related documents. SAML 2.0 was developed at OASIS, a nonprofit consortium that provides support for the development, convergence and adoption of open standards. At the time of this writing, OASIS has published 146 standards and 145 committee specifications. OASIS was a good home for SAML 2.0 because many organizations were already members and had agreed to its intellectual property guidelines. For more information about OASIS, you can visit their website at https://www.oasis-open.org. The terms defined by SAML have become an important part of the IAM lexicon. For example, a “SAML assertion” is a statement written in XML and issued by an “identity provider” about a “subject” (person) for a “relying party” (the recipient of the assertion) who is normally a “service provider” (website). Identity provider is abbreviated simply as “IDP” and service provider as “SP”. Assertions contain contextual information about the authentication procedure, as well as “attributes”—similar to LDAP attributes, these are little pieces of information about the person, such as first name or last name.

SAML is a mature standard, and it’s been successfully deployed to solve many business challenges. Its stability is one of its advantages—it has not been significantly updated since its 2.0 inception. Don’t feel bad if you find SAML somewhat hard to understand at first. SAML was finalized before the age of developer-friendly APIs—ease of use was not a design goal. When you first start learning about SAML, it's common to get the terms IDP and SP confused. If you’re new to SAML, think of the IDP as the server that holds the identity information for the person and the credentials (i.e., username and password). In most cases, you can substitute SP with “website”. If SAML were LDAP, the IDP would be the LDAP server, and the SP would be the LDAP client. Like other federation protocols, SAML uses public key cryptography to sign or encrypt messages and documents. The use of such keys enables the parties to protect and verify the integrity of information. By convention, most SAML servers use selfsigned X.509 certificates, whereas browsers make use of certificates issued by Certificate Authorities (CA). For browsers, using a CA makes sense—it enables validation of a certificate by trusting the root certificate that was used to issue it, enabling vendors to ship browsers with pre-trusted keys that save most people from having to know much about certificate trust. However, in SAML, the use of self-signed certificates has the security benefit of making trust management explicit—when you trust a certain selfsigned certificate, you are trusting a specific entity. Self-signed certificates are not shared between services. For example, if you have several SAML services, each would use its own certificate. And, of course, any SAML certificates would be different from the SSL certificate used by a web server (which is generally not self-signed anyway). If you’re reading this chapter, you need to learn at least the basics about SAML, so the goal is to make this as painless as possible and to discuss some of the tools at our disposal to manage SSO using SAML. We will stick to the most common SAML use cases and ignore the more esoteric SAML capabilities.

For a test IDP, you can use the Shibboleth IDP deployed in the Gluu Server. There are many other excellent free open source SAML tools—we will cover only some of the more common ones. But, hopefully, the concepts and methodologies will be transferable to other software solutions and libraries. So, without further ado, let’s start with a slightly deeper dive into the standard itself, and then move onto the software!

Assertions

Assertions contain the goods—the information that a web application needs from the Identity Provider about the person accessing the site. A SAML assertion can be composed of four different sections:

  • Subject is an identifier for the person. This can be a one-time identifier that will change each time the person visits the site, or it can be a consistent identifier that will enable continuity with a person’s previous activity.
  • Authentication statements contain information about when and how the person was authenticated.
  • Attribute statements contain information about the subject, like first name, last name, email address, role or group memberships.
  • Authorization statements contain information about whether the subject should be granted access to a requested resource. This is a somewhat esoteric part of SAML, which you will probably not encounter for SSO use cases.

saml

Protocols

While SAML is commonly referenced as a protocol, it’s actually more than that. As a matter of fact, inside the SAML specification the word “protocol” is also defined, and these protocols specify how to understand the different messages that are exchanged between the sender and receiver. The core specification defines requests sent (usually by the SP), and the responses sent (usually by the IDP). The core specification defines certain information that must be present in every message and then describes the details of messages that are sent for specific use cases.

saml

SAML defines six protocols:

  • Assertion Query and Request Protocol defines messages and processing rules for requesting existing assertions if the requester knows the unique identifier of an assertion, or if the requester can identify the subject and statement type.

  • Authentication Request Protocol is one of the most important protocols. It defines how an SP can find out who the subject is, as well as details about the authentication, such as when and how the authentication occurred. If a server can respond to this protocol, it is an identity provider!

  • Artifact Resolution Protocol is used for direct communication between the IDP and SP—sometimes referred to as a “back-channel”. The browser may pass along a reference identifier (artifact) used to obtain a protocol message. (Spoiler alert: It’s the “code” in the OpenID Connect authorization code flow.) The IDP or SP can use that reference identifier to pick up the full payload directly from the sender without the browser’s further involvement. There are some security advantages to this, as well as the opportunity to transfer larger data files more efficiently. We will not touch on this protocol in this chapter, as it’s not widely used.

  • Name Identifier Management Protocol is used by an SP to request an IDP to provide a name identifier for a subject in a particular format or context.

  • Single Logout Protocol is a protocol that can be initiated by either the IDP or SP to affect logout. There are a lot of issues with session management due to different standards in use, so this protocol isn’t very reliable.

  • Name Identifier Mapping Protocol is used by an SP request and an IDP to provide a name identifier for a subject in a particular format.

IDP-Initiated vs SP-Initiated Authentication

What comes first, the chicken or the egg? The equivalent question in SAML is “What comes first, logging into the IDP or the SP?” Today, we know the answer to this question: log into the SP first—keep it simple and avoid all the complexity and corner cases that entertaining IDP—initiated authentication will cause by trying to figure out how to enable authentication without an authentication request! This is the reason OpenID Connect, a more modern federation protocol, did not support IDP-initiated authentication. Consider Figures 3-4 and 3-5, which represent typical (and really oversimplified) SAML flows for the purpose of highlighting the differences between these two flows.

Very simplified SP initiated authentication flow

saml

Very simplified sample IDP-initiated authentication flow

saml

OAuth

OAuth 2.0 (or simply as “OAuth” because OAuth 1.0 is now irrelevant) defines a mechanism for using bearer tokens to make authorized HTTP requests. Simple possession of a bearer token enables access. For example, a long time ago in New York City, if you had a “subway token,” you inserted it into the turnstile and entered the subway station. No questions asked—you have the token, you get in. Bearer tokens are also called “access tokens”.

Although OAuth is known primarily as a technology for consumer applications, its popularity is expanding in enterprise IAM. A common misperception is that OAuth is a protocol. More accurately, it is a framework for authorization—a set of foundational patterns and vocabulary. OAuth is not an authentication protocol. If this were true, OAuth would need to provide specific details, like how messages are sent, the exact data structure of those messages, and how message integrity is assured. Vittorio Bertocci made an apt analogy in one of his Microsoft blogs for Microsoft:

OAuth 2.0 as a building block for implementing a sign-in flow is not only perfectly possible, but quite handy too: a LOT of web applications take advantage of that, and it works great. But that does NOT mean that OAuth is an authentication protocol, with all the affordances you’ve come to expect from one, as much as using chocolate to make fudge does not make (chocolate == fudge) true.

Scopes

Scopes are used to specify the extent of access for a token. Think of an airplane boarding pass. Airlines scan your boarding pass before they let you enter the plane. If the agent sees a green light, they let you pass—real-world token validation! On some airlines, not all boarding passes are the same. You might prefer a boarding pass that entitles you to sit in first class. Or your boarding pass may qualify you for a specific seat assignment. You can think of scopes as these additional constraints. For example, a resource server may offer many APIs. The authorization server may use scopes to differentiate which APIs you can access. Even within an API, certain features may not be available unless you have a token with the right scope. Scopes may be any string value, but it’s a common practice to use a URI for scopes. The advantage of this practice is that scopes will be less likely to collide. For example, many developers in an organization may want to use a scope called “write”. If each developer uses a different URI namespace for their “write” scope, they will be unique. Google publishes a list of all their APIs and which scopes are required in order to call them.

scopes

According to Figure, if you want to call a Google API that allows you to “view and manage your data in Google BigQuery,” you’ll need a token with the scope https://www.googleapis.com/auth/bigquery. Using this approach, Google uses scopes to manage which clients can access which features of their service. Google follows the convention of using URIs for scope values. You may also notice that the APIs are versioned. If an API is updated, new features may be introduced, which may require different scopes. Client developers rely on these scopes and need to know about them before they write their code.

OAuth Roles

You’ll sometimes hear OAuth referred to as “three-legged”. Those legs are the “client,” “resource server,” and “authorization server”. The client is the software (website or mobile application) that is either requesting a protected resource or connected to a person requesting a protected web resource. The resource server is the software that has web content that needs protecting, API endpoints for example. The authorization server is the software that issues tokens to a client. Another way to think about OAuth is that the resource server is the policy enforcement point (PEP), and the authorization server is the policy decision point (PDP). The figure shows the three legs, as well as the people who interact with them.

security_oauth

Authorization Server

The authorization server is the most complex component of an OAuth infrastructure. For some deployments, the authorization server and the resource server may be one software process. For larger deployments, a centralized, single-purpose authorization server may issue tokens to control access to a distributed network of resource servers.

The authorization server holds client credentials—for example, an API key and secret for each client. Client credentials are important, because they enable an organization to provision specific permissions for a client. Don’t confuse client credentials with a person’s credentials (i.e., username and password). Person authentication is a very different requirement from client authentication. People are messy analog carbon-based things. Clients are software. There are fewer options to authenticate clients than people. In some flows, the authorization server may also need to authenticate a person, who then directly authorizes a client. For example, if you’ve used Google login from a thirdparty website, after authenticating (unless you are already logged into Google) you are presented with a dialog asking you to authorize this client to call an API that will release information about you, as shown in Figure.

security_oauth

Resource Server

The resource server, acting as the policy enforcement point, plays a critical role in the security of the OAuth infrastructure. Its job is to make sure a valid token is present— that the token is not expired and that it has been authorized for the correct scopes. The resource server will also need to understand how to use the various types of tokens. In some cases, the resource server must itself call APIs of the authorization server—to validate a token, for example. The resource server will want to minimize external calls and the processing time to decrypt tokens. For this reason, many resource server implementations cache tokens to expedite subsequent authorization for the same token value. The developer who writes a resource server will have to coordinate with the administrators of the authorization server regarding what policies are in place to issue tokens for certain scopes. Sometimes the same group controls both the authorization server and resource servers. In other cases, the authorization server might be in control of the central access management team. This is the reason that Google publishes the scopes that are required to call its APIs, as shown in Figure.

The resource server will have to decide which policies to delegate to the authorization server. “The user has approved access” is the most common policy for consumer OAuth applications. But using UMA, a protocol that builds on OAuth you can map scopes to enterprise access policies.

Centralized policy management is a useful approach but it’s not a silver bullet. In general, centralized policy management works well for coarse-grain authorization, but not for fine-grain authorization. In a web application, what to display on a certain page is normally controlled by “fine-grained” permissions. Coarse-grained policies are shared across more than one application. For example, what type of authentication is required, which internal roles can access which types of applications, which software clients to trust—implementing these policies in every application would lead to code duplication.

Client

An OAuth client is the software that calls the protected resource. It is frequently connected to a person—the requesting party. The client obtains a token from the authorization server and presents it to the resource server. In many cases, it’s the job of the client developer to obtain client credentials at the authorization server and to know what scopes are needed to access the resource server. The client may need to process a redirect to enable the requesting party to interact with the authorization server. The client may also need to handle errors that are returned by either the authorization server or the resource server. Clients may be a website or native application. In some cases, the client may even be a JavaScript application that exists entirely in the person’s browser. It’s important to remember that the client is not the same as the browser. The browser is software that the requesting party uses to access the Internet. SAML jargon “user agent” applies here too. The client is software that is between the browser and the protected resource. In machine-to-machine transactions, where there is no requesting party, the policies must apply only to information about the client and the context of the transaction. For example, is this client (authenticated with a client_id and secret), calling from this network, during this time of day, authorized to obtain a token for certain scopes? A certain group of clients may be associated with a certain partner or category of applications—such information about the client is called a “client claim” and is different from user claims about the requesting party.

Tokens

A token is an abstraction that represents permission by the authorization server to do something. An access token is a short-lived token that is obtained by the client. A refresh token is a long-lived token that is presented by the client to the authorization server in exchange for a new access token. “Short-lived” means one hour or less, but actual times may vary depending on the policy of the authorization server. One to five minutes are common lengths for the lifetime of an access token. The authorization server decides what type of token to return. Each access token type definition specifies the additional attributes (if any) sent to the client along with the access token. The client should not use a token if it does not understand the token type. Each token has a different security profile and is useful for different use cases. Long strings are frequently used as bearer tokens. OAuth provides a mechanism where additional token types can be registered as extensions. JSON Web Tokens (JWTs)—pronounced “jots”—are popular, and are sometimes used as the bearer token string. There is also an extension for something called “MAC” tokens. These are tokens that enable the client to protect the access token value and are useful for non-secure communication channels. However, due to the widespread use of TLS for OAuth, MAC tokens aren’t used that much in enterprise deployments, so we’ll skip discussion of these tokens.

Bearer Tokens

RFC 6570, “URI Template,” describes bearer token usage in OAuth. A bearer token is any data structure that gives the possessor rights to do something—without requiring the owner of the token to verify control of a cryptographic key. A bearer token can be a string of sufficient entropy to make guessing unlikely. It can also be an XML or JSON document encoded appropriately. OAuth relies heavily on bearer tokens, but SAML IDPs also commonly use them—i.e., signed SAML assertions. It is imperative to the security of any access management infrastructure based on bearer tokens to prevent an attacker from gaining possession of the token during transmission, in memory, or on the disk. If this happens, game over! The resource server has no way to distinguish the attacker from the authorized token owner! Listing figure is a simple example of an OAuth bearer token returned from an authorization server.

OAuth Token Endpoint Response

HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Cache-Control: no-store
Pragma: no-cache
{
"access_token": "41902768-ae84-4a1c-8e62-566a8605b90f",
"token_type": "Bearer",
"expires_in": 3600,
"refresh_token": "d1d50489-98ac-4d21-9ddb-2358caf835c3"
}

The most common way for a client to send a bearer token to the resource server is to include the token in the Authorization header field, although RFC 6570 also defines mechanisms to send the access token as an HTML form encoded body parameter or in the URI (which is a bad idea). Using the header, the bearer token would look like this:

Authorization: Bearer 41902768-ae84-4a1c-8e62-566a8605b90f

In simple OAuth implementations, where the resource server and authorization server are the same, the resource server might make a query to the local database to retrieve information about a bearer token—for example, when does it expire or for what scopes was it authorized? If the authorization server is remote, RFC 7662 defines an API for “token introspection”. This provides a mechanism for the resource server to retrieve a JSON object from the authorization server that describes the token. Listing figure is an example from RFC 7662 of a response to the introspection API.

Sample OAuth Token Introspsection Response

HTTP/1.1 200 OK
Content-Type: application/json
{
"active": true,
"client_id": "l238j323ds-23ij4",
"username": "jdoe",
"scope": "read write dolphin",
"sub": "Z5O3upPC88QrAjx00dis",
"aud": "https://protected.example.net/resource",
"iss": "https://server.example.com/",
"exp": 1419356238,
"iat": 1419350238,
"extension_field": "twenty-seven"
}

JSON Web Token (JWT)

Defined in RFC 7519, the JWT token type is essentially a compact syntax to send an optionally signed and encrypted JSON object. It’s surprisingly compact on the wire—you may even be able to send it as a query parameter. The token can contain user claims and can eliminate the need for token introspection. JWT tokens are also particularly advantageous in stateless web architectures. Another application for JWT is where cookies can’t be used, for example, due to restrictions on writing third-party cookies. JSON Web Tokens consist of three sections, separated by two periods:

Header.Payload.Signature

The header describes the cryptographic algorithms used for signing and encryption, for example: {"alg": "RS256"}. If you want to know the meaning of “RS256,” you need to check RFC 7518, which describes JSON web algorithms. If you don’t want to use any encryption or signing in your JWT, you can use {"alg":"none"}. In this case, there would be no text after the second period—you’d just have a header and payload. The header may also be used to send unencrypted claims. Note that the value of any substantive unencrypted claims should be verified against the signed JSON payload. The JSON payload portion of the token may have three types of claims: reserved, public, and private. The reserved claims are the ones defined in the OAuth specifications, such as iss (issuer), exp (expiration time), sub (subject), and aud (audience). Public claims are registered at the IANA JSON Web Token Registry or are collision-resistant URIs. Private claims are ad hoc claims agreed upon by the organizations using them. Validation of the signature is too complex for treatment here. The token can be encrypted, signed, signed and encrypted, or encrypted and signed. If encrypted, the client would have to previously register its public key with the authorization server. For signing, public keys of the authorization server are frequently provided on a URL for download. OAuth supports many different signing and encryption algorithms. As mentioned, check RFC 7518 for a full list. Then, check to make sure these are supported on the authorization server

Proof-of-Possession Tokens

A proof-of-possession token, also called a holder-of-key (HoK) token, requires control of a cryptographic key to provide additional evidence that the presenter of the token is the party to whom the token was issued. This approach mitigates the risk of stolen tokens. RFC 7800 introduces how to declare in a JWT that the presenter of the JWT possesses a particular proof-of-possession private key and how the recipient can cryptographically confirm this. Thus, a JWT can be either a bearer token or a proof-of-possession token.

Token Binding

Token binding is an advanced topic that is still under development, and whose future is not 100% certain, as adoption by websites and browsers has been slow. The Google team has even discussed the possibility of dropping support for it, although hopefully they’ll keep the feature. The idea is for the browser to generate a public key, private key, and an identifier (Token Binding Key) for TLS connections to a web server. These keys and identifiers are long-lived. When a connection is made between the browser and website, the identifier can be remembered at the website. When issuing a security token (e.g., an HTTP cookie or an OAuth token) to a client, the server can include the Token Binding ID in the token, thus cryptographically binding the token to TLS connections between that client and server, as well as inoculating the token against abuse (re-use, attempted impersonation, etc.) by attackers. This would protect the token from man-in-the-middle, token export, and replay attacks. In a typical OAuth session, there are several TLS connections: between the browser and AS, between the browser and RS, between the client and AS, and even between the RS and AS. There are several opportunities to use token binding to improve security. The initial work addresses how to protect OAuth access tokens and refresh tokens for TLS connections between the client and AS. Token binding could be a useful tool to prevent man-in-the-middle attacks, and its use has been proposed for banking and financial services profiles. An alternative to token binding is mutual TLS, for which another OAuth draft is under development.

Registration

The authorization server needs to know information about each client before it can issue the client a token. This is similar to SAML, where the IDP needs to configure trust for each SP. What is new in OAuth is a standardized option for self-service registration. In SAML, the IDP administrator generally imports the SP’s metadata or configures information about the SP. This is usually a manual process, although sometimes a website may create a proprietary process for self-provisioning. OAuth registration defines standards for client provisioning. During registration, the client is issued a client identifier. At a minimum, the client must tell the AS the URIs where it is okay to send users after the AS has finished its interaction—the redirect_uris. This is important because the AS should never redirect a person’s browser to a URI that has not been previously registered. If the redirect_uri is a web address, it must always use the https scheme, and the AS must validate the TLS certificate or certificate keychain. Frequently, the authorization collects other information about the client such as a name, an icon, a URL of the home page, a link to the privacy policy, and a brief description of the application. The client may also register an asymmetric client secret. Client registration is also an appropriate time for the client to notify the AS about its preferences. What types of cryptographic algorithms are preferred? What scopes are requested? What are the default user authentication mechanisms desired? All this information may be provided during registration.

Although the API and vocabulary for registration can align to OAuth standards, the business process may still vary. The developer may need to complete a form, sign a legal agreement, or provide various pieces of information about their organization. Sometimes, the client credentials are automatically created and available for use. For example, the OpenID Connect profile of OAuth defines an API that enables a client developer to automatically register.

Note The probability of an attacker guessing generated tokens (and other credentials not intended for handling by end-users) MUST be less than or equal to 2^(-128) and SHOULD be less than or equal to 2^(-160).

If you have trouble computing those huge numbers, it’s about 1 in 340 undecillion to 1.5 quindecillion (give or take a few billion). That’s a number with 38-48 zeros at the end. How you can accomplish this is beyond my mathematical capabilities.

Grants

The process or method by which a client obtains an access token is called an authorization grant. The grant represents a permission for the client to access an API endpoint. Each type of authorization grant has a different flow with its own security characteristics. Following is a description of the different grants, and when their use is appropriate.

grant

Authorization Code Grant

This flow, for web-based client applications, uses a “code” (a guess resistant string) to represent a person’s delegation to the client. The code is returned to the person’s browser and is forwarded to the client, who exchanges it (with client credentials) for a token or tokens. The code can only be used once, which reduces the risk of it leaking.

It’s important that the code be protected and that it’s sufficiently long to prevent guessing. However, because the code is passed through the browser, it’s still susceptible to interception by existing malware. If additional security is needed, there is an extra security step that can be implemented called Proof Key for Code Exchange (PKCE, pronounced “pixy”) by OAuth public clients, described in RFC 7636. Using this mitigation technique, the client adds a nonce to the authorization request. If the code is intercepted, without this extra nonce, it cannot be exchanged for a token.