Skip to content

Latest commit

 

History

History
2428 lines (2055 loc) · 175 KB

01-idm-intro.adoc

File metadata and controls

2428 lines (2055 loc) · 175 KB

Understanding Identity and Access Management

The beginning of knowledge is the discovery of something we do not understand.
— Frank Herbert

What is identity and access management? Answer to that question is both easy and complex. The easy part is: identity and access management (IAM) is a set of information technologies that deal with identities in the cyberspace. The complex part of the answer takes the rest of this book.

The story of identity and access management starts with information security. The security requirements dictate the need for authentication and authorization of the users. Authentication is a mechanism by which the computer checks that the user is really the one they pretend to be. Authorization is a related mechanism by which the computer determines whether to allow or deny the user a specific action. Almost every computer system has some means of authentication and authorization.

Perhaps the most widespread form of authentication is a password-based "log in" procedure. The user presents an identifier and a password. The computer checks whether the password is valid. For this procedure to work the computer needs access to the database of all valid users and passwords. Early stand-alone information systems had their own databases that were isolated from the rest of the cyberspace. The data in the database were maintained manually. However, the advent of computer networking changed everything. Users were able to access many systems, and the systems themselves were connected to each other. Maintaining an isolated user database in each system did not make much sense any longer. That’s where the real story of digital identity begins.

Note
Enterprise Identity and Access Management
This book deals mostly with Enterprise Identity and Access Management. That is identity and access management applied to larger organizations such as enterprises, financial institutions, government agencies, universities, health care organizations, etc. The focus is on managing employees, contractors, customers, partners, students and other people that cooperate with the organization. However, many of the mechanisms and principles described in this book can be applied to non-enterprise environments.

Directory Services and Other User Databases

The central concept of identity management is a data record which contains information about a person. This concept has many names: user profile, persona, user record, digital identity and many more. The most common name in the context of identity management is user account. Accounts usually hold information that describes the real-world person using a set of attributes such as given name and family name. However, probably the most important part of the account is the technical information that relates to the operation of an information system for which the account is created. This includes operational parameters such as location of user’s home directory, wide variety of permission information such as group and role membership, system resource limits and so on. User accounts are represented in a wide variety of forms, ranging from relational database records through structured data files to semi-structured text files. Regardless of the specific method used to store and process the records, the account is undoubtedly one of the most important concepts of IAM field - and so are the databases where the accounts are stored.

The account databases are as varied as the account types. Most account databases in the past were implemented as an integral part of the monolithic information system using the same database technology as the system used. This is an obvious choice, and it remains quite popular even today. Therefore, many accounts are stored in relational database tables and similar application data stores.

Applications

Application data stores are usually tightly bound to the application. Therefore, accounts stored in such databases are difficult to share with other applications. However, sharing account data across the organization is more than desirable. It makes very little sense to maintain account data in each database separately – especially if most the accounts are the same in each application. Therefore, there is a strong motivation to deploy account databases that can be shared by many applications.

Directory servers are built with the primary purpose to provide shared data storage to applications. While application databases usually use their own proprietary communication protocol, directory servers implement standardized protocols. While databases are built for application-specific data model, directory servers usually extend standardized data model, which improves interoperability. While databases may be heavyweight and expensive to scale, directory servers are designed to be lightweight and provide massive scalability. That makes directory servers almost ideal candidates for a shared account database.

Directory server

Shared identity store is making user management easier. An account needs to be created and managed in one place only. Authentication still happens in each application separately. Yet, as the applications use the same credentials from the shared store, the user may use the same password for all the connected applications. This is an improvement over setting the password for each application separately.

Identity management solutions based on shared directory servers are simple and quite cost-efficient. Therefore, we have been giving the same advice for many years: if you can connect all your applications to an LDAP directory server, do not think too much about it and just do it. The problem is that this usually works only for very simple systems.

Lightweight Directory Access Protocol (LDAP)

Lightweight Directory Access Protocol (LDAP) is a standard protocol for accessing directory services. It is an old protocol, when judging by Internet age standards. LDAP roots go as far back as 1980s to a family of telecommunication protocols known as X.500. Even though LDAP may be old, it is widely used. It is a very efficient binary protocol that was designed to support massively distributed shared databases. It has a small set of well-defined simple operations. The operations and the data meta-model implied by the protocol allow very efficient data replication and horizontal scalability of directory servers. This simplicity contributes to low latencies and high throughput for read operations. The horizontal scalability and relative autonomy of directory server instances is supposed to increase the availability of the directory system. These benefits often come at the expense of slow write operations. As identity data are often read but seldom modified, slower writes are usually a perfectly acceptable trade-off. Therefore, LDAP-based directory servers were, and in many places still remain, the most popular databases for identity data.

LDAP is one of the precious few established standards in the IAM field. However, it is far from being perfect. LDAP was created in 1990s, with roots going back to 1980s. There are some problems in original LDAP design, that were never fully addressed. Also, LDAP schema has a distinctive feel of 80s and 90s. LDAP would deserve a major review, to correct the problems and bring the protocol to 21st century. Sadly, there hasn’t been a major update to LDAP specifications for decades.

Even though LDAP has its problems, it still remains a useful tool. Most LDAP server vendors provide proprietary solutions to LDAP problems. Many organizations store identities in LDAP-enabled data stores. There are many applications that support LDAP, mostly for centralization of password-based authentication. LDAP still remains a major protocol in Identity and Access Management field. Therefore, we will be getting back to the LDAP protocol many times in this book.

Directory Servers are Databases

Directory servers are just databases that store information. Nothing more. The protocols and interfaces (APIs) used to access directory servers are designed as database interfaces. They are good for storing, searching and retrieving data. While the user account data often contain entitlement information (permissions, groups, roles, etc.), directory servers are not well-suited to evaluate them. I.e. directory server can provide information what permissions an account has, but it is not designed to make a decision whether to allow or deny a specific operation. Also, directory servers do not contain data about user sessions. Directory servers do not know whether the user is currently logged in or not. Many directory servers are used for basic authentication and even authorization. Yet, the directory servers were not designed to do this. Directory servers provide only the very basic capabilities. There are plug-ins and extensions that provide partial capabilities to support authentication and authorization. However, that does not change the fundamental design principles. Directory servers are databases, not authentication or authorization servers. Being databases, they should be used as such.

However, many applications use directory servers to centralize password authentication. In fact, this is somehow good and cost-efficient way to centralize password-based authentication, especially if you are just starting with identity and access management. Nevertheless, you should be aware that this a temporary solution. It has many limitations. The right way to do it is to use an authentication server instead of directory server. Access Management (AM) technologies can provide that.

Single Directory Server Myth

Now listen, this is a nice and simple idea: Let’s keep all our user data in a single directory server. All our applications can access the data there, all the applications will see the same data. We even have this "LDAP" thing, standardized protocol to access the database. Here, all identity management problems are solved!

Unfortunately, they are not. Shared directory server makes user management easier. However, this is not a complete solution, and there are serious limitations to this approach. The heterogeneity of information systems makes it nearly impossible to put all required data into a single directory system.

The obvious problem is the lack of a single, coherent source of information. There are usually several sources of information for a single user. For example, a human resources (HR) system is authoritative for the existence of an employee in the enterprise. However, the HR system is usually not authoritative for assignment of employee identifier such as username. There needs to be an algorithm that ensures uniqueness of the username, ensuring uniqueness across all the current and past employees, contractors and partners. Moreover, there may be additional sources of information. For example Management information system may be responsible for determination of user’s roles (e.g. in project-oriented organizational structure). Inventory management system may be responsible for assigning telephone number to the user. The groupware system may be an authoritative source of the user’s e-mail address and other electronic contact data. There are usually 2 to 20 systems that provide authoritative information for a single user. Therefore, there is no simple way how to feed and maintain the data in the directory system.

Then there are spacial and technological barriers. Many complex applications need local user database. They must store the copies of user records in their own databases to operate efficiently. For example, large billing systems cannot work efficiently with external data (e.g. because of a need to make relational database join). Therefore, even if directory server is deployed, these applications still need to maintain a local copy of identity data. Keeping the copy synchronized with the directory data may seem like a simple task. But it is not. Additionally, there are legacy systems which usually cannot access the external data at all (e.g. they do not support LDAP protocol at all).

Some services need to keep even more state than just a simple database record. For example file servers usually create home directories for users. While the account creation can be usually done in on-demand fashion (e.g. create user directory at first user log-on), the modification and deletion of the account is much more difficult. Directory server will not do that.

Perhaps the most painful problem is the complexity of access control policies. Role names and access control attributes may not have the same meaning in all systems. Different systems usually have different authorization algorithms that are not mutually compatible. While this issue can be solved with per-application access control attributes, the maintenance of these attributes is seldom trivial. If every application has its own set of attributes to control access control policies, then the centralized directory provides only a negligible advantage. The attributes may as well reside in the applications themselves. That’s exactly how most directory deployments end up. Directory servers contain only the groups, groups that usually roughly approximate low-level RBAC roles.

Quite surprisingly, LDAP standards themselves create a significant obstacle to interoperability in this case. There are at least three or four different - and incompatible - specifications for group definition in LDAP directories. Moreover, the usual method to manage LDAP groups is not ideal at all. It is especially problematic when managing big groups. Therefore, many directory servers provide their own non-standard improvements, which further complicates interoperability. Yet even these server-specific improvements usually cannot support complex access control policies. Therefore, access control policies and fine-grained authorizations are usually not centralized in directory servers, they are maintained directly in the application databases.

The single directory approach is feasible only in very simple environments or in almost entirely homogeneous environments. In all other cases there is a need to supplement the solution by other identity management technologies.

This does not mean that the directory servers or other shared databases are useless. Quite the contrary. They are very useful if they are used correctly. They just cannot be used alone. More components are needed to build a complete solution.

Access Management

While directory systems are not designed to handle complex authentication, access management (AM) systems are built to handle just that. Access management systems handle all the flavors of authentication, and even some authorization aspects. The principle of all access management systems is basically the same:

  1. Access management system gets between the user and the target application. This can be done by a variety of mechanisms, the most common method is that the applications themselves redirect the user to the AM system if they do not have existing session.

  2. Access management system prompts user for the username and password, interacts with user’s mobile device or in any other way initiates the authentication procedure.

  3. User enters the credentials.

  4. Access management system checks the validity of credentials and evaluates access policies.

  5. If access is allowed then the AM system redirects user back to the application. The redirection usually contains an access token: a small piece of information that tells the application that the user is authenticated.

  6. Application validates the token, creates a local session and allows the access.

Access management flow

After that procedure, the user works with the application normally. Only the first access goes through the AM server. This is important for AM system performance and sizing, and it impacts session management functionality.

The applications only need to provide the code that integrates with the AM system. Except for that small integration code, applications do not need to implement any authentication code at all. It is the AM system that prompts for the password, not the application. This is a fundamental difference when compared to LDAP-based authentication mechanisms. In the LDAP case, it is the application that prompts for the password. In the AM case, the Access Management server does everything. Many applications do not even care how the user was authenticated. All they need to know is that he was authenticated and that the authentication was strong enough. This feature brings a very desirable flexibility to the entire application infrastructure. The authentication mechanism can be changed at any time without disrupting the applications. We live in an era when passwords are frowned upon and replaced by stronger authentication mechanisms. The flexibility that the AM-based approach brings may play a key part in that migration.

Web Single Sign-On

Single Sign-On (SSO) systems allow user to authenticate once, and access number of different systems re-using that authentication. There are many SSO systems for web applications, however it looks like these systems are all using the same basic principle of operation. The general access management flow is described below:

  1. Application A redirects the user to the access management server (SSO server).

  2. The access management server authenticates the user.

  3. The access management server establishes session (SSO session) with the user browser. This is crucial part of SSO mechanism.

  4. User is redirected back to application A. Application A usually establishes a local session with the user.

  5. User interacts with application A.

  6. When user tries to access application B, it redirects user to the access management server.

  7. The access management server checks for existence of SSO session. As the user authenticated with the access management server before, there is a valid SSO session.

  8. Access management server does not need to authenticate the user again, it immediately redirects user back to application B.

  9. Application B establishes a local session with the user and proceeds normally.

The user usually does not even realize that there were any redirects when accessing application B. There is no interaction between the redirects, and the processing on the access management server is usually very fast. It looks like the user was logged into the application B all the time.

Web SSO flow

Authorization in Access Management

The request of a user accessing an application is directly or indirectly passed through the access management server. Therefore, the access management server can analyze the request and evaluate whether the user request is authorized or not. That is a theory. Unfortunately, the situation is much more complicated in practice.

The AM server usually intercepts only the first request to access the application, because it would be a performance impact to intercept all the requests. After the first request, the application established a local session and proceeds with the operation without any communication with the AM server. Therefore, the AM server can only enforce authorization during the first request. This means it can only enforce a very coarse-grained authorization decisions. In practice, it usually means that the AM server can make only all-or-nothing authorization decisions: whether a particular user can access all parts of a particular application or that he cannot access the application at all. The AM server usually cannot make any finer-grain decisions just by itself.

Some AM systems provide agents that can be deployed to applications, agents that enforce a finer-grain authorization decisions. Such agents often rely on HTTP communication. They are making decisions based on the URLs that the user is accessing. This approach might have worked well in the 1990s, but it has only very limited applicability in the age of single-page web applications and mobile applications. In such cases the authorization is usually applied to services rather than applications.

However, even applying the authorization to service front-ends does not solve the problem entirely. Sophisticated applications often need to make authorization decisions based on context, which is simply not available in the request or user profile at all. E.g. an e-banking application may allow or deny a transaction based on the sum of previous transactions that were made earlier that day. While it may be possible to synchronize all the authorization information into the user profile, it is usually not desirable. It would be a major burden to keep such information updated and consistent, not to mention security and privacy concerns. Many authorization schemes rely on a specific business logic, which is very difficult to centralize in an authorization server.

Then there are implementation constraints. In theory, the authorization system should make only allow/deny decisions. However, this is not enough to implement an efficient application. The application cannot afford to list all the objects in the database, pass them to authorization server, and then realize that the authorization server denied access to almost all of them. Authorization has to be processed before the search operation, and additional search filters have to be applied. Which means that authorization mechanisms need to be integrated deep into the application logic. This significantly limits the applicability of centralized authorization mechanisms.

AM systems often come with a promise to unify authorization across all the applications and to centralize management of organization-wide security policies. Unfortunately, such broad promises are seldom fulfilled. The AM system can theoretically evaluate and enforce some authorization statements. This may work well during demonstrations and even in very simple deployments. Yet in complex practical deployments, this capability is extremely limited. The vast majority of the authorization decisions is carried out by each individual application and is completely outside the reach of an AM system.

SAML and OpenID Connect

Some access management systems use proprietary protocols to communicate with the applications and agents. This is obviously an interoperability issue – especially when the AM principles are used in the Internet environment. Indeed, it is the Internet that motivated standardization in this field.

The first widespread standardized protocol in this field was Security Assertion Markup Language (SAML). The original intent of SAML was to allow cross-domain sign-on and identity data sharing across organizations on the Internet. SAML is both an access management protocol and a security token format. SAML is quite complex, heavily based on XML standards. Its specifications are long, divided into several profiles, there are many optional elements and features. Overall, SAML is not just a protocol, it is a set of very rich and flexible mechanisms.

Primary purpose of SAML is transfer of identity information between organizations. There are big SAML-based federations with hundreds of participating organizations. Many e-government solutions are based on SAML, there are big partner networks running on SAML, and overall it looks like SAML is a success. Yet, SAML was a victim of its own flexibility and complexity. The latest fashion trends are not very favorable to SAML. XML and SOAP-based web service mechanisms are hopelessly out of fashion, which impacts popularity of SAML. That has probably motivated the inception of other access management protocols.

The latest fashion still favors RESTful services and simpler architectural approaches. That probably contributed to the development of OpenID Connect protocol (OIDC). OpenID Connect is based on much simpler mechanisms than SAML, but it is reusing the same basic principles. OpenID connect has a very eventful history. It all started with a bunch of homebrew protocols such as LID or SXIP, that are mostly forgotten today. That was followed by the development of OpenID protocol, which was still very simple. OpenID gained some attention especially with providers of Internet services. Despite its simplicity, OpenID was not very well engineered, and it quickly reached its technological limits. It was obvious that OpenID needs to be significantly improved. At that time, there was almost unrelated protocol called OAuth, which was designed for management of cross-domain authorizations. That protocol was developed into something that was almost, but not quite, entirely unlike the original OAuth protocol. As the new protocol had almost nothing to do with the original OAuth protocol, it is perfectly understandable that it was dubbed OAuth2. In fact, OAuth2 is not really a protocol at all. It is rather a vaguely-defined framework to build other protocols. OAuth2 framework was used to build a cross-domain authentication and user profile protocol. This new protocol is much more similar to SAML than to the original OpenID, therefore it was an obvious choice to call it OpenID Connect. Some traditions are just worth maintaining.

Now there are two protocols that are using the same principle and doing almost the same thing: SAML and OpenID Connect. The principle of both protocols is illustrated in the following diagram.

SAML/OIDC flow

The interaction goes like this:

  1. User is accessing a resource. This can be web page or web application running on the target site.

  2. Target site does not have a valid session for the user. Therefore, it redirects user browser to the source site. It adds authentication request into that redirect.

  3. Browser follows the redirect to the source site. The source site gets the authentication request and parses it.

  4. If the user is not already authenticated with the source site then the authentication happens now. The source site prompts for the username, password, certificate, one-time password or whatever credential that is required by the policy. With a bit of luck the authentication succeeds.

  5. The source site redirects the browser back to the target site. The source site adds authentication response to the redirect. The most important part of the response is a token. The token directly or indirectly asserts user’s identity.

  6. The target site parses the authentication response and processes the token. The token may be just a reference (e.g. SAML artifact) pointing to real data, or it may be access key to another service that provides the data (OIDC UserInfo). In that case the target site makes another request (6a). This request is usually a direct one and does not use browser redirects. One way or another, the target site now has claims about user identity.

  7. Target site evaluates the identity, processes authorizations and so on. A local session with the user is usually established at this point to skip the authentication redirects on the next request. The target site finally provides the content.

Following table compares the terminology and technologies used in SAML and OIDC worlds.

SAML World OpenID Connect World

Source site

Identity Provider (IdP)

Identity Provider (IDP) or OpenID Provider (OP)

Target site

Service Provider (SP)

Relying Party (RP)

Token reference

SAML Assertion (or artifact)

ID token, access token

Token format

SAML Token

JSON Web Token (JWT)

Intended for

Web applications, web services (SOAP)

Web applications, mobile applications, REST services

Based on

N/A

OAuth2

Data representation

XML

JSON

Cryptography framework

XMLenc, XMLdsig

JSON Object Signing and Encryption (JOSE)

Careful reader surely noticed the similarity with the web-based access management mechanisms. That’s right. This is the same wheel, reinvented over and over again. However, to be completely honest, we have limited our description to cover flows intended for web browser only. Both SAML and OIDC has broader applicability that just web browser flows. The differences between the two protocols are much more obvious in these extended use cases. However, the web browser case nicely illustrates the principles and similarities of SAML, OpenID Connect and also the simple web-SSO systems.

Maybe the most important differences between SAML, OIDC and web-SSO (also known as Web Access Management or WAM) systems is the intended use:

  • SAML was designed for the web applications and SOAP web services world. It will handle centralized (single-IDP) scenarios very well, but it can also work in decentralized federations. Go for SAML if you are using SOAP and WS-Security or if you plan to build big decentralized federation. On second thought, you should probably forget about SAML anyway. It is not very fashionable these days.

  • OpenID Connect was designed mostly for use with social networks and similar Internet services. Its philosophy is still somehow centralized. It will work well if there is one strong identity provider and many relying parties. Technologically it will fit into RESTful world much better than SAML. Current fashion trends are favorable to OIDC.

  • Web-SSO/WAM systems are designed to be used inside a single organization. This is ideal to implement SSO between several customer-facing applications so the customers will have no idea that they interact with many applications and not just one. The web-SSO/WAM systems are not designed to work across organizational boundaries. Which is quite unfortunate in the current "cloudy" IT environment.

Although SAML and OIDC are designed primarily for cross-domain use, it is no big surprise to see them inside a single organization. There is a clear benefit in using an open standardized protocol instead of a proprietary mechanism. However, it has to be expected that the SSO system based on SAML or OIDC will have slightly more complicated setup than a simple Web-SSO/WAM system.

Kerberos, Enterprise SSO and Friends

Many of us would like to think that everything is based on web technologies today, and that non-web mechanisms are things of the past. Yet, there are still systems that are not web-based and where web-based SSO and AM mechanisms will not work. There are still some legacy applications, especially in the enterprise environment - applications based on rich clients or even character-based terminal interactions. Then there are network operating systems such as Windows and numerous UNIX variants and there are network access technologies such as VPN or 802.1X. There are still many cases where web-based access management and SSO simply won’t work. These technologies usually pre-date the web. Honestly, the centralized authentication and single sign-on are not entirely new ideas. It is perhaps no big surprise that there are authentication and SSO solutions even for non-web applications.

The classic classroom example of non-web SSO system is Kerberos. The protocol originated at MIT in the 1980s. It is a single sign-on protocol for operating systems and rich clients based on symmetric cryptography. Even though it is a cryptographic protocol, it is not too complicated to understand, and it definitely withstood the test of time. It has been used to this day, especially for authentication and SSO of network operating systems. It is a part of Windows network domain, and it is often the preferred solution for authentication of UNIX servers. The most serious limitation of Kerberos is given by its use of symmetric cryptography. The weakness of symmetric cryptography is key management. Kerberos key management can be quite difficult especially when Kerberos realm gets very big. Key management is also one of the reasons why it is not very realistic to use Kerberos in cross-domain scenarios. However inside a closed organization, Kerberos is still a very useful solution.

The major drawback in using Kerberos is that every application and client needs to be "kerberized". In other words everybody that wants to take part in Kerberos authentication needs to have Kerberos support in one’s software. There are kerberized versions of many network utilities so this is usually not a problem for UNIX-based networks. However, it is a big problem for generic applications. There is some support for Kerberos in common web browsers which is often referred to as "SPNEGO". However, this support is usually limited to interoperability with Windows domains. Even though Kerberos is still useful for operating system SSO, it is not a generic solution for all applications.

Many network devices use RADIUS protocol for what network engineers call "Authentication, Authorization and Accounting" (AAA). However, RADIUS is a back-end protocol. It does not take care of client interactions. The goal of RADIUS is that the network device (e.g. WiFi access point, router or VPN gateway) can validate user credentials that it has received as part of other protocol. The client connecting to VPN or WiFi network does not know anything about RADIUS. Therefore, RADIUS is similar to the LDAP protocol, and it is not really an access management technology.

Obviously there is no simple and elegant solution that can provide SSO for all enterprise applications. Despite that one technology appeared in the 1990s and early 2000s and promised to deliver universal enterprise SSO solution. It was called "Enterprise Single Sign-On" (ESSO). The ESSO approach was to use agents installed on every client device. The agent detects when login dialog appears on the screen, fills in the username and password and submits the dialog. If the agent is fast enough the user does not even notice the dialog and this creates the impression of Single Sign-On. However, there are obvious drawbacks. The agents need to know all the passwords in a cleartext form. There are ESSO variations with passwords randomly generated or even single-user passwords which partially alleviates this problem. In that case there is an additional drawback that the ESSO also needs to be integrated with password management of all the applications, which is not entirely easy. However, the most serious drawback of ESSO are the agents. These only work on workstations that are strictly controlled by the enterprise. Yet the world is different now, enterprise perimeter has efficiently disappeared, and the enterprise cannot really control all the client devices. Therefore, also ESSO is now mostly a thing of the past.

Access Management and the Data

Access Management servers and identity providers need to know the data about users to work properly. However, it is quite complicated. The purpose of access management systems is to manage access of users to the applications. Which usually means processing authentication, authorization (partially), auditing of the access and so on. For this to work, the AM system needs access to the database where the user data are stored. It needs access to usernames, passwords and other credentials, authorization policies, attributes and so on. The AM systems usually do not store these data themselves. They rely on external data stores. In most cases, these data stores are directory services or noSQL databases. This is an obvious choice: these databases are lightweight, highly available and extremely scalable. The AM system usually need just simple attributes, therefore the limited capabilities of directories and NoSQL databases are not a limiting factor here. Marriage of access management and lightweight database is an obvious and very smart match.

However, there is one critical issue – especially if the AM system is also used as a single sign-on server. The data in the directory service and the data in the applications must be consistent. E.g. it is a huge problem if one user has different usernames in several applications. Which username should he use to log in? Which username should be sent to the applications? There are ways how to handle such situations, but this is usually very cumbersome and expensive. It is much easier to unify the data before the AM system is deployed.

Even though the "M" in AM stands for "management", typical AM system has only a very limited data management capabilities. The AM systems usually assume that the underlying user database is already properly managed. E.g. a typical AM system has only a very minimalistic user interface to create, modify and delete user records. Some AM systems may have self-service functionality (such as password reset), but even that functionality is usually very limited. Even though the AM relies on the fact that the data in the AM directory service and the data in applications are consistent, there is usually no way how to fully synchronize the data by using the AM system itself. There may be methods for on-demand or opportunistic data updates, e.g. creating user record in the database when the user logs in for the first time. However, there are usually no solutions for deleting the records or for updating the records of inactive users.

Therefore, the AM systems are usually not deployed alone. The underlying directory service or NoSQL database is almost always a hard requirement for even humblest AM functionality. However, for the AM system to really work properly, it needs something to manage and synchronize the data. Identity Management (IDM) system is usually used for that purpose. In fact, it is usually strongly recommended to deploy directory and IDM system before the AM system. The AM system cannot work without the data. If the AM tries working with data that are not maintained properly, it will not take a long time until it fails.

Advantages and Disadvantages of Access Management Systems

Access management systems have significant advantages. Most of the characteristics are given by the AM principle of centralized authentication. As the authentication is carried out by a central access management server, it can be easily controlled and audited. Such centralization can be used to consistently apply authentication policies - and to easily change them when needed. It also allows better utilization of an investment into authentication technologies. E.g. multi-factor or adaptive authentication can be quite expensive if it has to be implemented by every application. When it is implemented in the AM server, it is re-used by all the applications without additional investment.

However, there are also drawbacks. As the access management is centralized, it is obviously a single point of failure. Nobody is going to log in when the AM server fails. This obviously means major impact on functionality of all applications. Therefore, AM servers need to be highly available and scalable. Which is not always an easy task. The AM servers need a very careful sizing, as they may easily become a performance bottlenecks.

However, perhaps the most severe drawback is the total cost of access management solution. The cost of the AM server itself is usually not a major issue. However, the server will not work just by itself. The server needs to be integrated with every application. Even though there are standard protocols, the integration is far from being straightforward. Support for AM standards and protocols in the applications is still not universal. Especially older enterprise applications need to be modified to switch their authentication subsystem to the AM server. This is often so costly that the adoption of AM technologies is often limited just to a handful of enterprise applications. Although recent applications usually have some support for AM protocols, setting it up is still not an easy task. There are subtle incompatibilities and treacherous details, especially if the integration goes beyond mere authentication into authorization and user profile management.

Even though many organizations are planning deployment of an AM system as their first step in the IAM project, this approach seldom succeeds. Projects usually plans to integrate 50-80% applications into the AM solution. However, the reality is that only a handful of applications can be easily integrated with the AM system. The rest of the applications is integrated using an identity management (IDM) system, which is hastily added to the project. Therefore, it is better to plan ahead: analyze the AM integration effort, prototype the deployment, and make a realistic plan for the AM solution. Make sure the AM can really bring the promised benefits. Starting with IDM and adding AM part later is often much more reasonable strategy.

Homogeneous Access Management Myth

There are at least two popular access management protocols for the web. There are huge identity federations based on SAML. Cloud services and social networks usually use OpenID Connect or its variations. There are variations and related protocols to be used for mobile applications and services. Then there are other SSO protocols, primarily focused on intra-organizational use. There is no single protocol or mechanism that can solve all the problems in the AM world.

Additionally, the redirection approach of AM systems assumes that the user has something that can display authentication prompts and carry out user interaction. Which is usually a web browser. Therefore, the original variant of access management mechanisms applies mostly to conventional web-based applications. Variations of this approach are also applicable to network services and single-page web applications. However, this approach is usually not directly applicable for applications that use rich clients, operating system authentication and similar "traditional" applications. Browser is not the primary environment that can be used to carry out the authentication in those cases. There are some solutions that usually rely on embedded browser, however that does not change the basic fact that the AM technologies are not entirely suitable for this environment. These applications usually rely on Kerberos as an SSO system or do not integrate with any SSO system at all.

Typical IT environment is composed of a wild mix of technologies and not all of them are entirely web-based. Therefore, it is quite unlikely a single AM system can apply to everything that is deployed in your organization. Authentication is very tightly bound to the user interaction, therefore it depends on the method how the user interacts with the application. As the user is using different technologies to interact with the web application, mobile application and operating system then it is obvious that also authentication and SSO methods for these systems will be different.

Therefore, it has to be expected that there will be several AM or SSO systems in the organization, each one serving its own technological island. Each island needs to be managed.

Practical Access Management

Unifying access management system, Single Sign-On, cross-domain identity federation, social login, universally-applicable 2-factor authentication – there are the things that people usually want when they think about Identity and Access Management (IAM). These are all perfectly valid requirements. However, everything has its cost. It is notoriously difficult to estimate the cost of access management solutions, because the majority of the cost is not in the AM software. Huge part of the total cost is hidden inside existing applications, services and clients. All of this has to be considered when planning an access management project.

Even though the AM is what people usually want, it is usually wise not to start with AM as the first step. AM deployment has many dependencies: unified user database, managed and continually synchronized data and applications that are flexible enough to be integrated are the very minimum. Unless your IT infrastructure is extremely homogeneous and simple, it is very unlikely that these dependencies are already satisfied. Therefore, it is almost certain that an AM project attempted at the beginning of the IAM program will not reach its goals. It is much more likely for such AM projects to fail miserably. On the other hand, if the AM project is properly scoped and planned and has realistic goals, there is high chance of success.

Perhaps the best way to evaluate an AM project is to ask several questions:

  • Do I really need access management for all applications? Do I need 100% coverage? Can I afford all the costs? Maybe it is enough to integrate just a couple of applications that are source of the worst pain. Do I know which applications are these? Do I know what my users really use during they workday? Do I know what they need?

  • What are the real security benefits of AM deployment? Will I be disabling the native authentication to the applications? Even for system administrators? What will I do in case of administration emergencies (e.g. system recovery)? Would system administrators still be able to circumvent the AM system? If yes then what is the real security benefit? If not then what will be the recovery procedure in case the AM system fails?

  • Do I really need SSO for older and rarely used applications? What is the real problem here? Is the problem that users are entering the password several times per day? Or is the real problem that they have to enter a different username or password to different applications, and they keep forgetting the credentials? Maybe simple data cleanup and password management will solve the worst problems, and I can save a huge amount of money on AM project?

The access management technologies are the most visible part of the IAM program. However, it is also the most expensive part, and the most difficult piece to set up and maintain. Therefore, do not underestimate other IAM technologies. Do not try to solve every problem with AM golden hammer. Using the right tool for the job is a good approach in every situation. In IAM program, it is absolutely critical for success.

Identity Management

Identity management (IDM) is maybe the most overlooked and underestimated technology in the whole identity and access management (IAM) field. Yet IDM is a crucial part of almost every IAM solution. It is IDM that can bring substantial benefits to almost any organization. So, what that mysterious IDM thing really is?

Identity management is exactly what the name says: it is all about managing identities. It is about the processes to create Active Directory accounts and mailboxes for a new employee. IDM sets up accounts for students at the beginning of each school year. IDM makes it possible to immediately disable all access to a suspicious user during a security incident. IDM takes care of adding new privileges and removing old privileges of users during reorganization. IDM makes sure all the accounts are properly disabled when the employee leaves the company. IDM automatically sets up privileges for students and staff appropriate for their classes. IDM records access privileges of temporary workers, partners, support engineers and all the third-party identities that are not maintained in your human resources (HR) system. IDM automates the processes of role request and approval. IDM records every change in user privileges in the audit trail. IDM governs the annual reviews of roles and access privileges. IDM makes sure the copies of user data that are kept in the applications are synchronized and properly managed. IDM makes sure data are managed according to data protection rules. IDM does many other things that are absolutely essential for every organization to operate in an efficient and secure manner.

It looks like IDM is the best thing since the sliced bread. So where’s the catch? Oh yes, there is a catch. At least, there was a catch. The IDM systems used to be expensive. Very expensive. The IDM systems used to be so expensive, it was very difficult to justify the cost even with such substantial and clear benefits. That time is over now. Identity management is still not cheap. However, the benefits clearly outweigh the costs now.

Note
Terminology
The term identity management is often used for the whole identity and access management (IAM) field. This is somehow confusing because technologies such as single sign-on or access management do not really manage the identities. Such technologies manage the access to the applications. Even directory servers do not exactly manage the identities. Directory servers store the identities and provide access to them. There is in fact one whole branch of technologies that manage identities. Those systems are responsible for creating identities and maintaining them. Those are sometimes referred to as identity provisioning, identity lifecycle management or identity administration systems. However, given the current state of the technology such names are indeed an understatement. Those systems can do much more than just provisioning or management of identity lifecycle. Recently, the term Identity Governance and Administration (IGA) was introduced. It is supposed to include identity management systems with identity governance capabilities. We will refer to these systems simply as identity management (IDM) systems. When we refer to the entire field that contains access management, directory services, identity management and governance we will use the term identity and access management (IAM).

History of Identity Management

Let’s start at the beginning. In the 1990s there was no technology that would be clearly identified as "identity management". Of course, all the problems above had existed almost since the beginning of modern computing. There had always been some solutions for those problems. Historically, most of that solutions were based on paperwork and scripting. That worked quite well - until the big system integration wave spread through the industry in the 1990s and 2000s. As data and processes in individual applications got integrated, the identity management problems became much more pronounced. Manual paper-based processes were just too slow for the age of information superhighways. The scripts were too difficult to maintain in the world where new application is deployed every couple of weeks. The identity integration effort naturally started with the state-of-the-art identity technology of the day: directory services. As we have already shown, the directories were not entirely ideal tools for the job. The directories did not work very well in environment where people though that LDAP is some kind of dangerous disease, where usernames and identifiers were assigned quite randomly and where every application insisted that the only authoritative data are those stored in its own database.

The integration problems motivated the inception of identity management technologies in early 2000s. Early IDM systems were just data synchronization engines that were somehow hard-coded to operate with users and accounts. Some simple role-based access control (RBAC) engines and administration interfaces were added a bit later. During mid-2000s there were several more-or-less complete IDM systems. This was the first generation of real IDM systems. These systems were able to synchronize identity data between applications and provide some basic management capabilities. Even such a simple functionality was a huge success at that time. The IDM systems could synchronize the data without any major modification of the applications, therefore they brought the integration cost to a reasonable level - at the application side. The problem was that the cost of the IDM systems themselves was quite high. These systems were still somehow crude, therefore the configuration and customization required a very specialized class of engineers. IDM engineers were almost exclusively employed by IDM vendors, big system integrators and expensive consulting companies. This made the deployment of IDM solutions prohibitively expensive for many mid-size and smaller organizations. Even big organizations often deployed IDM solution with quite limited features to make the cost acceptable.

Early IDM systems evolved and improved in time. There were companion products for identity governance and compliance that augmented the functionality. Yet, it is often almost impossible to change the original architecture of a product. Therefore, almost all the first-generation IDM products struggled with limitations of the early product design. Most of them do not exist today, or are considered to be legacy.

All these IDM systems were commercial closed-source software. However, the closed-source character of the IDM products is itself a huge problem. Every IDM solution has to be more-or-less customized - which usually means more rather than less. It has to be the IDM system that adapts, and not the applications. Requiring each application to adapt to a standardized IDM interface means a lot of changes in a lot of different places, platforms and languages. The total cost of all necessary modifications adds up to a huge number. Such approach is tried from time to time, it almost always fails. While there are many applications in the IT infrastructure, there is just one IDM system. If the IDM system adapts to applications and business processes, the changes are usually smaller, and they are all in one place, implemented in a single platform. The IDM system must be able to adapt. It has to adapt a great deal, and it has to adapt easily and rapidly. Closed-source software is notoriously bad at adapting to requirements that are difficult to predict. Which in practice means that the IDM projects based on first-generation products were hard to use, slow to adapt and expensive.

Even worse, the closed-source software is prone to vendor lock-in. Once the IDM system is deployed and integrated, it is extremely difficult to replace it with a competing system. The closed-source vendor is the only entity that can modify the system, and the system cannot be efficiently replaced. Which means that the end customer is not in a position to negotiate. Which means high maintenance costs. It naturally follows that the first generation of IDM systems was a huge commercial success. For the vendors, that is.

Then the 2000s were suddenly over, with an economic crash at the end. We can only speculate what were the reasons, but the fact is that around the years 2009-2011 several very interesting new IDM products appeared on the market. One interesting thing is that all of them were more-or-less open source. The benefit that open source character brings may be easy to overlook for business-oriented people. However, the benefits of open source in the identity management are almost impossible to overstate. As every single IDM engineer knows, understanding of the IDM product, and the ability to adapt the product, are two critical aspects of any IDM project. Open source is the best way to support both understanding and flexibility. There is also third important advantage: it is almost impossible to create a vendor lock-in situation with an open source product. All the open source products are backed by companies that offer professional support services that are equivalent to the services offered by commercial IDM products. This brings quality assurance for the products and related services. However, the companies does not really "own" the products, there is no way for them to abuse intellectual property rights against the customers. Open source brings new and revolutionary approach, both to technology and business.

New products appeared since early 2010s, in many areas in identity and access management. However, it is still a humble identity management platform that forms the core of the solution. The products have evolved to the point, that the entire field is not called identity governance and administration (IGA). New functionality was added to the products: identity governance, analytics, policy management, advanced reporting, and many more. New IGA platforms are much more powerful than ever, yet they are still the heart of IAM solutions.

What is This Identity Management, Anyway?

Identity management (IDM) is a simple term which encompasses a very rich and comprehensive functionality. It contains identity provisioning (and reprovisioning and deprovisioning), synchronization, organizational structure management, role-based access control, data consistency, approval processes, auditing and few dozens of other features. All of that is thoroughly blended and mixed with a pinch of scripting and other seasoning until there is a smooth IDM solution. Therefore, it is quite difficult to tell what identity management is just by using a dictionary-like definition. We would rather describe what identity management is by using a couple of typical usage scenarios.

Let’s have a fictional company called ExAmPLE, Inc. This company has few thousand employees, decent partner network, customers and suppliers and all the other things as real-world companies have. And ExAmPLE company has an IDM system running in its IT infrastructure.

ExAmPLE hires a new employee, Alice. Alice signs an employee contract few days before she starts her employment. The contract is entered into the human resource (HR) system by the ExAmPLE HR staff. The IDM system periodically scans the HR records, it discovers the record of a new hire. The IDM systems pulls in the record and analyzes it. The IDM system takes user’s name and employee number from the HR record, it generates a unique username, and based on that information it creates a user record in the IDM system. The IDM system also gets the organization code of 11001 from the HR record. The IDM will look inside its organizational tree and discovers that the code 11001 belongs to sales department. Therefore, IDM will automatically assign the user to the sales department. The IDM will also process the work position code of S007 in the HR record. The IDM policies say that the code S007 means sales agent and that anybody with that code should automatically receive Sales Agent role. Therefore, the IDM assigns that role. As Alice is a core employee, the IDM system automatically creates an Active Directory account for her together with company mailbox. The account will be placed into the Sales Department organizational unit. The Sales Agent role entitles the user to more privileges. Therefore, the Active Directory account is automatically assigned to sales groups and distribution lists. The role also gives access to the CRM system, therefore CRM account is also automatically created and assigned to appropriate groups. All of that happens in a couple of seconds after the new HR record is detected. It all happens automatically.

ExAmPLE IDM system

Alice starts her career, and she is a really efficient employee. Therefore, she gets more responsibilities. Alice is going to prepare specialized market analyses based on empirical data gathered in the field. ExAmPLE is an innovative company, always inventing new ways how to make business operations more efficient. Therefore, they invented this work position especially to take advantage of Alice’s skills. Which means there is no work position code for Alice’s new job. However, she needs new privileges in the CRM system to do her work efficiently. She needs that right now. Fortunately, the ExAmPLE has a flexible IDM system. Alice can log into the IDM system, select the privileges that she needs and request them. The request has to be approved by Alice’s manager and by the CRM system owner too. They get the notification about the request, and they can easily approve or reject it in the IDM system. Once the request is approved, Alice’s CRM account is automatically added to appropriate CRM groups. Alice may start working on her analysis minutes or hours after she has requested the privileges.

ExAmPLE IDM approval

Alice lives happily ever after. One day she decides to get married. Alice, similarly to many other women, has the strange habit of changing her surname after the marriage. Alice has a really responsible work position now, she has accounts in a dozen information systems. This is no easy task to change her name in all of them, is it? In fact, it is very easy because ExAmPLE has its IDM system. Alice goes to the HR department, and the HR staff changes her surname in the HR system. The IDM system will pick up the change and propagate that to all the affected systems. Alice even automatically gets a new e-mail address with her new surname (keeping the old one as an alias). Alice receives a notification that now she can use her new e-mail address. The change is fast, clean and effortless.

ExAmPLE IDM rename

Later that day Alice discovers that her password is about to expire. Changing the password in all the applications would be a huge task. Alice knows exactly what to do. She logs into the IDM system and changes her password there. The password change is automatically propagated to each affected system according to policy set up by the IT security office.

Something unexpected happens the following month. There is a security incident. The security office discovered the incident, and they are investigating it. It looks like it was an insider job. The security officers are using the data from the IDM system to focus their investigation on users that had privileges to access affected information assets. They pinpoint Mallory as a prime suspect. The interesting thing is that Mallory should not have such powerful privileges at all. Luckily, the IDM system also keeps an audit trail about every privilege change. Therefore, security team discovers that it was Mallory’s colleague Oscar that assigned these privileges to Mallory. Both men are to be interviewed. As this incident affects sensitive assets, there are some preventive measures to be executed before any word about the incident spreads. The security officers use the IDM system to immediately disable all the accounts that Mallory and Oscar have. It takes just a few seconds for IDM to disable these accounts in all the affected applications.

ExAmPLE IDM disable

Later, the investigation reveals that Oscar is mostly innocent. Mallory misused Oscar’s trust and tricked him to assign these extra privileges. Mallory abused the privileges to get sensitive data which he tried to sell. The decision is that Mallory has to immediately leave the company while Oscar may stay. However, as Oscar has shown poor judgment in this case, his responsibilities are reduced. The IDM system is used to permanently disable all Mallory’s accounts, re-enable Oscar’s accounts, and also to revoke powerful privileges that are considered too risky for Oscar to have.

Few months later, Oscar is still ashamed because of his failure. He decides not to prolong his employee contract with ExAmPLE, and to leave the company without causing any more trouble. Oscar’s contract expires at the end of the month. This date is recorded in the HR system, and the IDM system takes it from there. Therefore, at midnight of the last Oscar’s day at work, the IDM system automatically deletes all Oscar’s accounts. Oscar starts a new career as a barman in New York. He is very successful.

ExAmPLE IDM termination

The security office has handled the security incident professionally, and the IDM system provided crucial data to make the security response quick and efficient. Security team receives praise from the board of directors. The team always tries to improve. They try to learn from the incident and reduce the possibility of such a thing happening again. The team is using data from the IDM system to analyze the privileges assigned to individual users. The usual job of the IDM system is to create and modify accounts in the applications. However, the IDM system is using bidirectional communication with the applications. Therefore, this analysis is far from being yet another pointless spreadsheet exercise. The analysis is based on real application data processed and unified by the IDM system: the real accounts, to which user they belong, what roles they have, which groups they belong to and so on. The IDM system can detect accounts that do not have any obvious owner. The analysis discovers quite a rich collection of testing accounts that were obviously used during the last data center outage half a year ago. The IT operations staff obviously forgot about these accounts after the outage. The security staff disables the accounts using the IDM tools and sets up an automated process to watch out for such accounts in the future.

ExAmPLE IDM orphan detection

Based on the IDM data, the security officers suspect that there are users that have too many privileges. This is most likely a consequence of the request-and-approval process and these privileges simply accumulated over time. Yet, this is just a suspicion. It is always difficult for a security staff to assess whether particular user should have certain privilege or should not have it. This is especially difficult in flexible organizations such as ExAmPLE, where work responsibilities are often cumulated and organizational structures is somehow fuzzy. Yet there are people that know what each employee should do: the managers. However, there are many managers on many departments, and it would be a huge task to talk to each one of them and consult the privileges. The IDM system comes to the rescue once again. The security officers set up automated access certification campaign. They sort all users to their managers based on the organizational structure which is maintained in the IDM system. Each manager receives an interactive list of their users and their privileges. The manager must confirm (certify) that the user still needs those privileges. This campaign is executed in a very efficient manner as the work is evenly distributed through the organization. The campaign is completed in a couple of days. At the end, the security officers know which privileges are no longer needed and can be revoked. This reduces the exposure of the assets, which is a very efficient way to reduce residual security risk.

Note
Experienced identity management professionals certainly realized that this description is slightly idealized. The real world is not a fairy tale and real life with an IDM system is much more complicated that this simple story suggests. Even though the real life is harder than a story in a book, the IDM system remains an indispensable tool for automation and information security management.

How Does The Technology Work?

Obviously identity management systems have a lot of advantages for business, processes, efficiency and all that stuff. How does it really work on a technological level? The basic principle is very simple: identity management system is just a sophisticated data synchronization engine at its core.

Identity management system takes data from the source systems, such as HR databases. It is processing the data, mapping and transforming the values as necessary. It will figure out which records are new. The IDM engine will do some (usually quite complex) processing on the records. That usually includes processing policies such as Role-Based Access Control (RBAC), organizational policies, password policies and so on. The result of this processing is creation or modification of user accounts in other systems such as Active Directory, CRM systems and so on. So basically it is all about getting the data, changing them and moving them around. This does not seem very revolutionary, does it? It is all about the details - the way how the IDM system gathers the data, how it is processing the data and how it is propagating the changes that make all the difference.

In addition to that, there are interesting possibilities for analysis and governance of identity data. The data are synchronized into database of identity management system, transformed, unified, all neatly organized and constantly kept up-to-date. It would be a real shame to let such attractive data to just sit there. Such data can be a basis for analysis. We can watch trends, changes in number of users, number of modifications, monitor application use (e.g. license utilization) and so on. We can look for patterns, suggest new roles, or detect users that have unusual permission patterns (outliers). We can also govern, distribute maintenance of data to departments, nominate role owners, set up processes to maintain the policies …​ we can all sorts of interesting things.

However, it all depends on the data. The data are at the very core of identity management, administration and governance. We need to have the data, taken from reliable source, transformed, unified, and most importantly, always kept fresh. This is the most important responsibility of identity management system.

Identity Management Connectors

Identity management system must connect to many different applications, databases and information systems. Typical IDM deployment has tens or even hundreds of such connections. Therefore, the ease of connecting IDM system with its environment is one of its essential qualities.

Current IDM systems use connectors to communicate with all surrounding systems. These connectors are based on principles similar to database drivers. On one end, there is unified connector interface that presents that data from all the systems using the same format. On the other end, there is a native protocol that the application supports. The connector acts as a translator between common data format and an application-specific protocol. There are connectors for LDAP and various LDAP variants, SQL protocols and dialects, connectors that are file-based, connectors that invoke REST services and so on. Every slightly advanced IDM system has tens of different connects.

Connectors

Connector is usually relatively simple piece of code. Primary responsibility of a connector is to adapt communication protocols. Therefore, LDAP connector translates the LDAP protocol messages into data represented using a common connector interface. The SQL connector does the same thing with SQL-based protocols. The connector also interprets the operations invoked on the common connector interface by the IDM system. Therefore, the LDAP protocol executes the "create" operation by sending LDAP "add" message to the LDAP server. Connectors usually implement the basic set of create-read-update-delete (CRUD) operations. Therefore, a typical connector is quite a simple piece of code. Despite its simplicity, the whole connector idea is a clever one. The IDM system does not need to deal with the communication details. The core of the IDM system can be built to focus on the generic identity management logic, which is typically quite complex just by itself. Therefore, any simplification that the connectors provide is more than welcome.

Connectors are usually accessing external interfaces (APIs) of source and target systems. It is natural that the connector authors will choose interfaces that are public, well-documented and based on open standards. Many newer systems have interfaces like that. However, there are notorious cases that refuse to provide such an interface. Despite that, there is almost always some way to build a connector. The connector may create record directly in the application database. It may execute a database routine. It may execute a command-line tool for account management. It may even do crazy things such as simulation of a user working with text terminal and filling out a form to create new account. There is almost always a way to do what connector needs to do. Just some ways are nicer than others.

The connector-based architecture is pretty much standard among all advanced IDM systems. Yet the connector interfaces significantly vary from one IDM system to another. The connector interfaces are all proprietary. Therefore, the connectors are not interchangeable between different IDM systems. The connectors are often used as weapons to somehow artificially increase the profit from IDM solution deployment. Except for one case. The ConnId connector framework is the only connector interface that is actively used and developed by several competing IDM systems. It is perhaps no big surprise that ConnId is an open source framework.

Even though connector-based approach is quite widespread, some (mostly older) IDM systems are not using connectors. Some IDM products use agents instead of connectors. Agent does a similar job than the connector does. However, agent is not part of the IDM system instance. Agents are installed in each connected application, and they communicate with the IDM system using a remote network protocol. This is a major burden. The agents need to be installed everywhere. Then they need to be maintained, upgraded, there may be subtle incompatibilities and so on. Also, running a third-party code inside every application can be a major security issue. Overall, the agent-based systems are too cumbersome (and too costly) to operate. The whole agent idea perhaps originated somewhere in our digital past when applications and databases haven’t supported any native remote interfaces. In such a situation the agents are obviously better than connectors. Fortunately, this is a thing of the past. Today even old applications have some way to manage identities using a remote interface. This is typically some REST service that is easy to access from a connector. Even if the application provides only a command-line interface or interactive terminal session there are connectors that can handle that sufficiently well. Therefore, today the agent-based identity management systems are generally considered to be obsolete. However, agents still have one advantage: the agent can reach deep into the system where it is deployed. The agent can monitor the operations, even enforce additional policies. For that reason, agents are still used in privileged access management (PAM) systems. However, it is still a major consideration whether it is worth the trouble of maintaining the agents.

Identity Provisioning

Provisioning, also called fulfilment, is perhaps the most frequently used feature in any IDM system. In the generic sense, provisioning means maintenance of user accounts in applications, databases and other target systems. This includes creation of the account, various modifications during the account lifetime, and permanent disable or delete at the end of the lifetime. The IDM system is using connectors to manipulate the accounts. In fact, good IDM systems can manage much more than just accounts. Management of groups and group membership was quite a rare feature in early years of IDM technology. Yet today, an IDM system that cannot manage groups is almost useless. Almost all IDM systems work with roles. However, only few IDM systems can also provision and synchronize the roles (e.g. automatically create LDAP group for each new role). Good IDM system can also manage, provision and synchronize organizational structures. However, even this feature is still not entirely common.

Synchronization and Reconciliation

Identity provisioning may be the most important feature of an IDM system. However, if an IDM system did just the provisioning and nothing else, it would be a quick an utter failure. It is not enough to create an account when a new employee is hired, or delete that account when an employee leaves. Reality works in mysterious ways, and it can easily make a big mess in a very short time. Maybe there was a crash in one of the applications and the data were restored from a backup. An account that was deleted few hours ago is unexpectedly resurrected. It stays there, alive, unchecked and dangerous. Maybe an administrator manually created an account for a new assistant because the HR people were all busy to process the papers. When the record finally gets to the HR system and it is processed, the IDM system discovers that there is already a conflicting account. The process stops with an error, waiting for administrator to manually resolve the situation. Maybe few (hundred) accounts get accidentally deleted by junior system administrator trying out an "innovative" system administration routine. There are simply too many ways how things can go wrong - and they often do go wrong. It is not enough for an IDM system to just set things up and then forget about it. One of the most important features of any self-respecting IDM system is to make sure that everything is right, and also that it stays right all the time. Identity management is all about continuous maintenance of the identities. Without that continuity the whole IDM system is pretty much useless.

The trick to keep the data in order is to know when they get out of order. In other words, the IDM system must detect when the data in the application databases change. If an IDM system detects that there was a change, then it is not that difficult to react to the change and fix it. The secret ingredient is the ability to detect changes. However, there’s a slight issue with that, isn’t it? We cannot expect that the application will send a notification to the IDM system every time a change happens. We do not want to modify the applications, otherwise the IDM deployment will be prohibitively expensive. The application needs to be passive, it is the IDM system that needs to be active. Fortunately, there are several ways how to do that.

Some applications already keep a track of the changes. Some databases record a timestamp of the last change for each row. Many directory servers keep a record of recent changes for the purpose of data replication. Such meta-data can be used by the IDM system. The IDM system may periodically scan the timestamps or replication logs for new changes. When the IDM detects a change, it can retrieve the changed objects and react to the change based on its policies. The scanning for changes based on meta-data is usually very efficient, therefore it can be executed every couple of seconds or minutes. Therefore, the reaction to the change can be done almost in the real-time. This method has many names in various IDM systems. It is called "live synchronization", "active synchronization" or simply just "synchronization". Sadly, this method is not always available. Applications do not provide suitable interfaces for real-time synchronization very often.

Yet, all is not lost. Even if the application does not maintain good meta-data that allow near-real-time change detection, there is still one very simple way that works for almost any system. The IDM system retrieves the list of all accounts in the application. Then it compares that list with the list of accounts that are supposed to be there. Therefore, it compares the reality (what is there) with the policy (what should be there). The IDM system can react to any discrepancies and repair them. This method is called reconciliation. It is quite a brutal method, almost barbaric. Yet it does the job.

Listing all accounts and processing each of them may seem as a straightforward job. However, it can be extremely slow if the number of accounts is high and the policies are complex. It can take anything from a few minutes to a few days. Therefore, it cannot be executed frequently. Running that once per day may be feasible, especially for small and simple systems. Running it once per week (on weekends) is quite common practice. Some systems cannot afford to run it more frequently than once per month.

There are other methods as well. However, synchronization and reconciliations are the most frequently used ones. The drawback of synchronization is that it is not entirely reliable. The IDM system may miss some changes, e.g. due to change log expiration, system times not being synchronized, network failures or variety of other reasons. On the other hand, reconciliation is mostly reliable. However, it is a very demanding task. Therefore, these two methods are often used together. Synchronization runs all the time and handles the vast majority of the changes. Reconciliation runs weekly or monthly and it acts as a safety net to catch the changes that might have escaped during synchronization.

Identity Management and Role-Based Access Control

There are a lot of users in our systems, every user needs a specific set of permission to access the applications. Managing permissions for every user individually becomes very difficult with populations as small as few hundreds of users. When the number of users goes over a thousand, such individual management of permissions becomes an unbearable burden. The individual management of permissions is not only a huge amount of work, it is also quite an error-prone routine. This has been known for decades. Therefore, many systems unified common combinations of permissions into roles, and the concept of Role-Based Access Control (RBAC) was born. The roles often represent work positions or responsibilities that are much closer to the "business" than technical permissions. A role may reflect the concepts of bank teller, website administrator or sales manager. User has a role, the role contains permissions, permissions are used for authorization - that is the basic principle of RBAC. The low-level permissions are hidden from the users. Users are quite happy when they deal with the business-friendly role names.

Note
Terminology
The term RBAC is frequently used in the industry, however the actual meaning of RBAC is not always clear. The confusion is perhaps caused by the fact that there is a formal RBAC specification known as NIST RBAC model. When people say "RBAC", some of them mean that specific formal model, others mean anything that is similar to that formal model, and yet others mean anything that deals with roles. We use the term RBAC in quite a broad sense. Major identity management systems usually implement a mechanism that is inspired by the formal NIST RBAC model, but the mechanism deviates form the formal model as necessary. That is what we mean when we use the term RBAC.

Most RBAC systems allow for roles to be placed inside other roles, thus creating role hierarchy. Top layers of the hierarchy are usually composed of business roles such as Marketing specialist. Business roles contain a lower-level roles. These are often application roles such as Website analytics or CMS administrator. These lower-level roles may contain concrete permissions, or they may contain other roles that are even closer to the underlying technology. And so on, and so on …​ there are proverbial turtles all the way down. Role hierarchy is often a must when the number of permissions and users grows.

Role hierarchy

No IDM system can be really complete without RBAC mechanism in place. Therefore, the vast majority of IDM systems support roles in one way or another. However, the quality of RBAC support significantly varies. Some IDM systems support only the bare minimum to claim RBAC support. Other systems have excellent and very advanced dynamic and parametric RBAC systems. Most IDM systems are somewhere in between.

Role-based mechanism is a very useful management tool. In fact, the efficiency of role-based mechanism often leads to its overuse. This is a real danger, especially in bigger and somehow complex environments. The people that design roles in such environment have a strong motivation to maintain order by dividing the roles to the smallest reusable pieces, and then re-combining them in a form of application and business roles. This is further amplified by the security best practices, such as the principle of least privilege. This is understandable and perfectly valid motivation. However, it requires extreme care to keep such RBAC structure maintainable. Even though this may seem counter-intuitive, it is quite common that the number of roles exceeds the number of users in the system. Unfortunately, this approach turns the complex problem of user management to even more complex problem of role management. This phenomenon is known as role explosion.

Role explosion is a real danger, and it is definitely not something that can be avoided easily. The approach that prevailed in the first-generation IDM deployments was to simply live with the consequences of role explosion. Some IDM deployments even created tools that were able to automatically generate and (more-or-less successfully) manage hundreds of thousands of roles. However, this is not a sustainable approach. The systems have evolved since then. Current IDM systems bring features that may help to avoid the role explosion in the first place. Such mechanisms are usually based on the idea to make the roles dynamic. The roles are no longer just a static set of privileges. Dynamic roles may contain small pieces of algorithmic logic used to construct the privileges. Input to these algorithms are parameters that are specified when the role is assigned. Therefore, the same role can be reused for many related purposes without a need to duplicate the roles. This can significantly limit the number of roles required to model a complex system. This is the best weapon against role explosion that we currently have. Moreover, the roles can be assigned (and unassigned) automatically. There are rules that specify when a user should have a role. This approach further simplified and automates role management. As most of the role management is driven by a policy, we call this approach policy-driven RBAC.

Even though the RBAC system has some drawbacks, it is necessary for almost any practical IDM solutions. There were several attempts to replace the RBAC system with a completely different approach. Such attempts have some success in the access management and related fields. However, those alternatives cannot easily replace RBAC in the identity management. Attribute-based access control (ABAC) and policy-based access control (PBAC) are two popular example, using the same principle. The basic idea is based on replacing the roles with pure algorithmic policies. Simply speaking, ABAC/PBAC policy is a set of algorithms that take user attributes as input. The policy combines that input with the data about operation and context. Output of the policy is a decision whether an operation should be allowed or denied. This approach is simple. It may work reasonably well in the access management world where the AM server knows a lot of details about the operation that just takes place, and the policies tend to be quite consistent and relatively simple. However, in the identity management field we need to set up the account before the user logs in for the first time. There are no data about the operation yet, and even contextual data are very limited. Even more importantly, policies used in the IDM field are usually not entirely "pure". Identity management is full of special cases, exceptions, subjective decisions and historical baggage. This is almost impossible to express in a form of an algorithm. That, together with other issues, makes ABAC/PBAC a very poor choice for an identity management system. Therefore, whether you like it or not, RBAC is the primary mechanism of any practical IDM solution - and it is here to stay.

Identity Management and Authorizations

The basic principle of authorization in the information security is quite straightforward: take the subject (user), operation and object (the things that user is trying to access). Evaluate whether the policy allows access for that subject-operation-object triple. If policy does not allow it, then deny the operation. This is quite simple, understandable and proven by decades of information security practice. This principle is rooted very deeply in information security tradition.

However, in the identity management field, we need to think quite differently. We need to work backwards. The IDM system needs to set up an account for a user before the user initiates any operation. When user really starts an operation, it happens in the application. The IDM system will not know anything about it. Therefore, the concept of authorization is turned completely upside down in the IDM world.

IDM system sets up accounts in applications and databases. However, the IDM system itself does not take any active part when user logs into an application and executes the operations. Applications do all than by themselves. Does that mean IDM system cannot do anything about authorizations? Definitely not. The IDM system does not enforce authorization decisions. However, the IDM system can manage the data that determine how the authorization is evaluated. IDM system can place the account in the correct groups, which will cause certain operations to be allowed and other operations to be denied. IDM system can set up an access control lists (ACLs) for each account that it manages. IDM system is not evaluating or enforcing the authorizations directly. However, it indirectly manages the data that are used to evaluate authorizations. This is an extremely important feature.

Authentication and authorizations are two very prominent concepts of information security. They are vitally important for any identity and access management solution. However, authentication is quite simple in principle. Yes, the user may have several credential types used in adaptive multi-factor authentication. While that description sounds a bit scary, the actual implemented is not very complicated. In most cases there is just a couple of policy statements that govern authentication. Also, authentication is typically quite uniform: most users are authenticating using the same mechanism. Authentication is not that difficult to centralize (although it may be expensive). Most applications do not care about authentication at all, they just need to know that the user was authenticated somehow, and know user’s identifier. Applications do don care about authentication details. Which makes authentication relatively easy to manage.

However, it is quite a different story for authorization. Every application has slightly different authorization mechanism. These mechanisms are not easy to unify. One of the major obstacles is that every application works with different objects, the objects may have complex relations with other objects and all of them may also have complex relations with the subjects. The operations are also far from being straightforward, as they may be parametrized. Then there is context. There may be per-operation limits, daily limits, operations allowed only during certain times or when system is in certain state. And so on, and so on. This is very difficult to centralize. Also, almost every user has slightly different combination of authorizations. Which means that there is a great variability and a lot of policies to manage. Then there are two crucial aspects that add whole new dimension of complexity: performance and scalability. Authorization decisions are evaluated all the time. It is not rare to see an authorization evaluated several times for each request. Authorization processing needs to be fast. Really fast. Even a round-trip across a local network may be a performance killer. Due to complexity and performance reasons the authorization mechanisms are often tightly integrated into the fabric of each individual application. E.g. it is a common practice that authorization policies are translated to SQL, and they are used as an additional clauses in application-level SQL queries. This technique is taking advantage of the database engine to quickly filter out the data that the user is not authorized to access. This method is very efficient, and it is perhaps the only practical option when dealing with large-scale data sets. However, this approach is tightly bound to the application data model, and it is usually almost impossible to externalize.

There are approaches that partially address some parts of authorization problems. Policy agents (such as Open Policy Agent) are small software components that are placed on the application side. They integrate with the application, processing authorization policies. The policies are managed centrally, distributed to the agents and cached by the agents. The agent can process policies quickly with acceptable latencies. However, many problems still remain unsolved. It is still very difficult to consider application concepts, data models and object relations using the agents. As the agents are generic, they are usually not capable of SQL rewriting, which requires intimate knowledge of application data model and data storage schemas. Agents can be very useful for some cases, however they are not universal solution, at least not yet.

Therefore, it is not realistic to expect that the authorization could be completely centralized anytime soon. The authorization policies still need to be distributed into the applications. However, managing partial and distributed policies is no easy task. Someone has to make sure that the application policies are consistent with the overall security policy of the organization. Fortunately, the IDM systems are designed especially to handle management and synchronization of data in broad range of systems. Therefore, the IDM system is the obvious choice when it comes to management of authorization policies.

Organizational Structure, Roles, Services and Other Wildlife

Back in the 2000s the IDM was all about managing user accounts. It was enough to create, disable and delete an account to a have a successful IDM deployment. The world is a different place now. Managing the accounts is simply not enough anymore. Yes, automated account management brings significant benefits, and it is a necessary condition to get at least a minimal level of security. However, account management is often not enough to justify the cost of an IDM system. Therefore, current IDM systems can do much more than just a simple account management.

There are many things that an advanced IDM system can manage:

  • Accounts. Obviously. Many IDM systems can fully manage account attributes, groups membership, privileges, account status (enabled/disabled), validity dates and all the other details.

  • Groups and roles. Apart from managing the membership of accounts in groups, the IDM system can take care of the whole group life-cycle: create a group, manage it and delete it.

  • Organizational structure. The IDM system can take organizational structure from its authoritative source (usually HR), use it for governance, and synchronize it to all the applications that need it. Alternatively, the IDM itself may be used to manually maintain an organizational structure.

  • Servers, services, devices and "things". While this is not yet IDM mainstream, there are some experimental solutions that use IDM principles to manage concepts that are slightly outside the traditional IDM scope. E.g. there is an IDM-based solution that can automatically deploy predefined set of virtual machines for each new project. The new IDM systems are so flexible that they can theoretically manage everything that is at least marginally related to the concept of identity: applications, mobile devices, printers, virtual machines, networks …​ almost anything. This is still quite a unique functionality. However, it is very likely that we will see more stories about this in the future.

While all these features are interesting, some of them clearly stand out. The management of groups and organizational structure are those that are absolutely critical for almost any new IDM deployment. Your organizational structure may be almost flat and project-oriented, or you may have twelve levels of divisions and sections. Regardless of the size and shape of your organizational structure, it needs to be managed and synchronized across applications in pretty much the same way as identities are synchronized. You may need to create groups in Active Directory for each of your organizational unit. You want them to be correctly nested. You may want to create distribution list for each of your ad-hoc team. You want this operation to have as little overhead as possible, otherwise the teams cannot really be managed in ad-hoc fashion. You may want to synchronize the information about projects into your issue tracking system. You may also want to automatically create a separate wiki space and a new source code repository for each new development project. The possibilities are endless. Both the traditional organizations and the new lean and agile companies will benefit from that.

Organizational structure management is closely related to group management. The groups are often bound to workgroups, projects or organizational units. E.g. and IDM system can automatically maintain several groups for each project (admin and member groups). Those groups can be used by applications for authorization purposes. Similarly, an IDM system can automatically maintain application-level roles, access control lists (ACLs) and other data structures that are usually used for authorization.

Organizational tree synchronization

While this functionality provides benefits in almost any deployment, organizational structure management is absolutely essential for organizations that are based on tree-like functional organizational structures. These organizations heavily rely on the information derived from organizational structure. E.g. direct manager of the document author can review and approve the document in the document management system. Only the employees in the same division can see the document draft. Only the employees of a marketing section can see marketing plans. And so on. Traditionally, such data are encoded into an incomprehensible set of authorization groups and lists. That contributes to the fact that reorganizations are a total nightmare for IT administrators. However, an IDM system can significantly improve the situation. IDM can create the groups automatically. It can make sure that the right users are assigned into these groups. It can synchronize information about the managers into all affected applications. And so on. Good IDM system can do all of that using just a handful of configuration objects.

This seems to be almost too good to be true, which it somehow is. It is fair to admit that the quality of organizational management features significantly varies among IDM systems. Group management and organizational structure management seem to be a very problematic feature. Only few IDM systems support these concepts at the level that allows practical out-of-box deployment. Most IDM systems have some support for that, but any practical solution requires heavy customization. It is not clear why IDM vendors do not pay attention to features that are required for almost any IDM deployment. Therefore, when it comes to a comprehensive IDM solution there is one crucial advice that we could give: choose the IDM product wisely.

Everybody Needs Identity Management

Such a title may look like a huge exaggeration. In fact, it is very close to the truth. Every non-trivial system has a need for identity management, even though the system owners may not realize that. As you are reading this book, chances are that you are one of the few people that can see the need. In that case it is all mostly about costs/benefits calculation. Identity management has some inherent complexity. While even very small systems need IDM, the benefits are likely to be too small to justify the costs. The cost/benefit ratio is much better for mid-size organizations. Comprehensive, automated identity management is an absolute necessity for large-scale systems. There seems to be a rule of thumb that has quite a broad applicability:

Number of users Recommendation

Less than 200

You may need automated identity management, but the benefits are probably too small to justify the costs. Managing users accounts manually is probably still a feasible option.

200 – 2 000

You need automated identity management, and the benefits may be just enough to justify the costs. However, you still need to look for a very cost-efficient solution. Automating the most basic and time-consuming tasks is probably just enough.

2 000 – 20 000

You really need automated identity management. You cannot manage that crowd manually. If you implement identity management solution properly, the benefits will be much higher than the costs.

More than 20 000

I can’t believe that you do not have any automated identity management yet. Go and get one. Right now. You can thank us later.

Identity Governance

Identity governance is basically an identity management taken to a higher business level. The identity management proper, sometimes called identity administration, is focused mainly on technical aspects of identity life-cycle such as automatic provisioning, synchronization, evaluation of the roles and computing attribute values. On the other hand, identity governance abstracts from the technical details, focusing on policies, roles, business rules, processes and data analysis. E.g. a governance system may deal with segregation of duties policy. It may drive the process of access certification. It may focus on automatic analysis and reporting of the identity, auditing and policy data. It drives remediation processes to address policy violations. It manages application of new and changed policies, evaluate how is your system compliant with policies and regulations and so on. It may evaluate risk levels, modeling overall organizational risk, identifying risk hot-spots. This field is sometimes referred to as governance, risk management and compliance (GRC).

Almost all IDM systems will need at least some governance features to be of any use in practical deployments. Moreover, many governance features are just refinement of concepts that originated in the IDM field many years ago. Therefore, the boundary between identity management and identity governance is quite fuzzy. The boundary is so fuzzy that new terms were invented for the unified field that includes the identity management proper together with identity governance. Identity governance and administration (IGA) is one of these terms. For us, the governance is just a natural continuation of identity management evolution.

Back in the 2010s, it was a common practice for identity governance features to be implemented by specialized products that are separated from their underlying IDM platforms. Many IDM and governance solutions are still divided into (at least) two products. This strategy brings new revenue streams for the vendors. Yet, it makes almost no sense from customer’s point of view. It perhaps comes without saying that reasonable IDM/IGA solutions should offer both the IDM and governance features in one unified and well aligned product.

Identity Governance Features

Below is a list of features that belong to the governance/compliance category. As the boundary of governance is so fuzzy, there are also features that may be considered governance-related IDM features.

  • Delegated administration. Basic IDM deployments are usually based on the idea of an omnipotent system administrator that can do almost anything. Then there are end users that can do almost nothing. While this concept may work in small and simple deployments, it is not sufficient for larger systems. Large organizations usually need to delegate some administration privileges to other users. There may be HR personnel, people that are responsible for management of their organizational units, administrator responsible for a particular group of systems, application administrators, and so on. Delegated administration allow delagation of parts of system administration tasks to other users. Such delegated privileges are limited in scope. For example, HR staff can edit selected attributes on all employees, or project managers can add/remove users in their projects.

  • Deputies. Delegated administration is very useful, yet it is quite static. It is specified in policies that are not entirely easy to change. However, there is often a need for ad-hoc delegation, such as a temporary delegation of privileges during manager’s vacation. Such a manager could nominate a deputy that would receive some of manager’s privileges for a limited time period. This is all done on an ad-hoc basis, initiated by an explicit action of the manager.

  • RBAC-related policies, such as Segregation of Duties (SoD) policy. Simply speaking SoD policy ensures that conflicting duties cannot be accumulated with a single person. This is usually implemented by using a role exclusion mechanisms. However, it may go deeper. E.g. it may be required that each request is approved by at least two people.

  • Policies related to organizational structure. Organizational structure may look like a simple harmless tree, but in reality it is far from being simple or harmless - or even a tree. In theory, the organizational structure should be managed by business or operations departments such as HR. Yet the reality is often quite different. Business departments lack the tools and processes to efficiently manage organizational structure. Therefore, it is often an IDM system that assumes the responsibility for organizational structure management. In such cases there is a need to police the organizational structure. For example, there may be policies that mandate a single manager for each department. In that case the IDM system may need to handle situations that there is no manager or too many managers.

  • Dynamic approval schemes. Approval processes are usually considered to be part of basic identity management functionality, as they were present in early IDM systems back in the 2000s. The user requests role assignment. The operation is driven through an approval process before being executed. Approvals provide a very useful mechanism, especially in case that role assignment cannot be automated, usually due to a non-existent policies.

    Approvals are usually implemented by some kind of general-purpose workflow engine by almost all IDM/governance systems. However, this is often a source of maintenance problems, especially in deployments that are focused on identity governance functionality. In such cases, the approval processes are no longer simple quasi-linear workflows. Approval processes tend to be very dynamic, and their nature is almost entirely determined by the polices rather that process flows. Workflow engines have a very hard time coping with such a dynamic situation. IDM system that implement special-purpose policy-based approval engines provide much better solutions.

    Approval mechanisms are very useful. However, they also have a dark side. Approval decisions are often made on a very subjective "looks good" basis. This obviously opens an opportunity for bad decisions and negligence. False denial of role assignment is likely to trigger an immediate (and occasionally quite emotional) feedback from the requester. However, approval of role assignment that should not be assigned is likely to trigger no feedback at all. Yet, such a decision is likely to cause a security risk, a risk that is very difficult to detect. This can be partially solved by a multi-level approval processes, especially for sensitive roles. However, there is trend to "automate" approval decisions based on "artificial intelligence" mechanisms. This may look like a very useful tool and time-saver. However, the artificial intelligence is only as good as are the training data. If the machine is trained using bad decisions, it will also suggest bad decisions. This is further complicated by a very limited visibility and accountability of such decisions. Therefore, such mechanisms have to be used with utmost care. It is perhaps a better idea to replace request-and-approval process with automated policy, which can be validated, reviewed and consistently applied. Even though the access request process cannot be completely replaced, policies should be applied as broadly as it is possible.

  • Entitlement management deals with entitlements of user’s accounts in target systems, such as role or group membership. However, this process can go both ways. Governance systems may provide an "entitlement discovery" features that take entitlements as inputs. This can be used evaluate compliance and policy violations, but it may also be a valuable input for role engineering.

  • Role mining. Identity management systems are seldom deployed on a green field. In the common case, there are existing systems in place, there are application roles, entitlements and privileges. It is not an easy job to create IDM roles that map to this environment. This is usually a slow and tedious process. However, IDM system can retrieve all the existing information and use it to propose role structure. This is not a fully deterministic process, it requires a lot of user interaction, tuning, and it is often based on a machine learning capabilities. It is not a replacement for role engineering expertise. However, machine-assisted role mining can significantly speed up the process.

  • Access certification. Assignment of roles is often an easy task. Request a role, role goes through an approval process, and the role is assigned. Then everybody forgets about it. There is a significant incentive to request assignment of a new role. Yet, there is almost no incentive to request unassignment of a role that is no longer needed. This leads to accumulation of privileges over time. Such privilege hoarding may reach dangerous levels for employees with long and rich job transfer history. Therefore, there are access certification campaigns that are also known as "certification", "re-certification" or "attestation" mechanisms. The goal of those campaign is to confirm ("certify" or "testify") that the user still needs the privileges that were assigned previously. Certification campaigns are designed to be conducted on a large number of users in a very efficient manner. Therefore, there are special processes, and a very specific user interface is provided to conduct such campaigns. Campaigns are planned certification actions that handle many users at once, distributing the work among many certifiers. This is usually a huge amount of work. On the other hand, there are micro-certifications. These are very small certification actions, usually involving just a single user. They are triggered automatically, for example as a reaction of re-assigning user to a different organizational unit, or by accumulating dangerous risk level for one user.

    Similarly to approval processes, some identity governance systems offer "artificial intelligence" support for certification processes. Such assistance can be very attractive, as certification campaigns are often quite intimidating due to a large number of decisions that have to be made in each campaign. However, the risks of such "automation" is even more pronounced than it is in the approval case. Certification is often the last defence against dangerous privilege accumulation. Poorly-trained artificial intelligence may cause a systematic build-up of risk in the organization. Artificial inteligence support for certification decisions may be very useful. However, it is essential to understand how the mechanism works before committing to roll it out for all certifications.

    Yet again, the best way to handle certification effort is to avoid certification altogether. Policy-based approach relies on roles assigned automatically. Which also means they are unassigned automatically, and there is no need for case-by-case certification. Of course, the policy needs to be created, properly maintained and reviewed. However, the effort to maintain a policy is likely much smaller than certification effort.

  • Role governance is usually quite a complex matter. Typical IDM deployment is likely to have a large number of roles. It is quite hard to define those roles in the first place. Then it is even harder to maintain the roles. Environment is changing all the time, therefore the roles have to change as well. It is usually beyond the powers of a single administrator to do so. Therefore, many role owners are usually nominated to take care of role maintenance. Roles are often grouped into applications, categories, catalogs or functional areas. The IDM system must make sure that the owners have the right privileges to do their job. The IDM system should also take care that each role has at least one owner at any given time, that role definitions are periodically reviewed and so on.

  • Role lifecycle management is a dynamic part of role governance. Role changes are likely to have a serious impact on overall security of the system. Therefore, it may not be desirable to simply delegate role management duties. It may be much more sensible to require that role changes has to be approved before being applied. New roles are also created all the time and old roles are decommissioned. The IDM system may need to make sure that a decommissioned role is not assigned to any new user. Yet, old roles may still be needed in the system during a phase-out period. IDM system has to keep track of them, to avoid keeping outdated roles in the systems forever.

  • Role modeling. A change of a single role often does not make much sense just by itself. The roles are usually designed in such a way that a set of roles works together and forms a role model. Therefore, approval of each individual role change may be too annoying, and it may even be harmful. E.g. there may be an inconsistent situation in case that one change is approved and another is rejected. Therefore, roles and policies are often grouped into models. The models are reviewed, versioned and applied in their entirety.

  • Simulation. IDM deployments tend to be complex. There are many relations, interactions and policies. It is no easy task to predict the effects of a change in a role, policy or organizational structure. Therefore, some IDM systems provide a simulation features that provide predictions and impact analyses of planed changes.

  • Compliance policies, reporting and management. Policies in the identity management world are usually designed to be strictly enforced. This works fine for fundamental policies that are part of simple IDM deployments. However, the big problem is how to apply new policies - especially policies that are mandated by regulations, recommendations and best practices. It is almost certain that significant part of your organization will not be compliant with such new policy. Applying the policy and immediately enforcing it is likely to cause a major business disruption. However, it is almost impossible to prepare for new policies and to mitigate their impact without knowing which users and roles are affected. Therefore, there is a two-step process. The policies are applied, but they are not enforced yet. The policies are used to evaluate the compliance impact. Compliance reports can be used to find the users that are in violation of the policy, in order to remedy the situation. Compliance reports may also be used to track the extent and progress of compliance.

  • Remediation. Good IDM deployments strive for automation. All the processes and actions that can be automated are automated. E.g. if a role is unassigned and user does no longer needs an account, such account is automatically deleted or disabled. However, there are actions that cannot be automated because they require decision of a living and thinking human being. Approvals are one example of such processes. However, there are more situations like that. Many of those require more initiative than a simple yes/no decision. One such example is organizational structure management. There is usually a rule that each department must have a manager. However, what should IDM system do in case that a department manager is fired? IDM system cannot stop that operation, as there are certainly good reasons to revoke all privileges of that manager. The manager and all the associated accounts have to go, as soon as possible. Now there is a department without a manager, and the IDM system itself cannot do anything about it. That is where remediation comes to the rescue. Remediation process is started after the operation that removed the manager. The remediation process will ask a responsible person to nominate a new manager for the department. There may be a broad variety of remediation processes. Simple process will ask for yes/no decisions, or it may ask to nominate a user. Then there are often options to set up generic processes that apply to completely unexpected situations.

  • Risk management automation. Information security is not a project, it is a process. It starts with risk analysis, planning, execution, and then it goes back to analysis and planning and execution, and so on and so on for ever and ever. Risk analysis is the part of the process that takes a huge amount of time and effort - especially when it comes to analysis of insider threat as there is usually a lot of insiders to analyze. However, an IDM system can help to reduce the risk analysis effort. Each role assigned to a user is a risk. If roles are marked with relative risk levels, IDM system can compute the accumulation of risk for each user. As each role gives access to a particular set of assets, the IDM system may provide data to evaluate asset exposure to users.

  • Identity analytics and intelligence (IdA) is mostly an umbrella term. It usually refers to a composition of several identity governance features, integrated into a holistic, risk-based approach. Identity analytics and intelligence starts by a look at the data. The process starts with a very realistic assumption that the data are not in perfect order, that there are inconsistencies, imperfections, risks and all kinds of other problems. Various techniques are employed to detect the problems. Most techniques seems to be based on recognition of anomalies and patters in the data. Outlier detection mechanisms look for users with privileges that are significantly different from the privileges of their colleagues. On the other hand, role mining is used to detect similar privileges assigned to similar users, suggesting new roles. Many of the identity analytics and intelligence techniques are based on risk modeling. There are mechanisms to identify over-privileged users by analysing risk scores of individual users. Similar mechanisms can be used to identify high-privilege entitlements assigned to a low-privilege user or similar risk anomalies.

  • Workflow orchestration is provided by some IGA platforms. Workflow engines drive processes based on simple algorithms, usually containing many manual steps that need to be carried out by different people of teams. IGA platforms use workflow automation mostly to implement approval mechanisms. While use of workflow engine for approvals may look like an obvious choice, workflow engines are perhaps the worst tools possible. Approval processes are usually dynamic process, their form heavily depends on input (request) and policy settings. The list of approvers, approval stages and exit conditions depend on the set of requested roles and other factors (e.g. user risk level), which is not an easy thing to handle in high-level business process modeling language.

    Even though workflow orchestration is almost useless for implementation of approval processes, it still has its place in IGA platform. Workflow automation may be useful for driving on-boarding (enrollment) and off-boarding processes. It may also be useful for some remediation cases, although remediation tends to be an unstructured or semi-structured activity that is better handled by case management than workflow automation.

    Almost all IGA platforms that support workflow automation are bringing their own (often proprietary) workflow engine. This means that the administrators need to learn how to configure the workflows, users need to adapt to new user interface, notifications need to be integrated and so on. It would be much better to re-use existing workflow engine, an engine that is already used by the organization to drive all its other business processes. Except for approvals which heavily depend on role structure, other IGA processes tend to be very similar to ordinary business processes in the organization. Re-use of existing workflow automation and orchestration platform should be a natural choice. Except for one annoying detail. Most organizations do not have such a system. Therefore, even that strange proprietary workflow engine embedded in the IGA platform may still be quite useful.

Not all IGA platforms will implement all the features. Scope and quality of implementation dramatically vary from system to system. Moreover, individual IGA platforms are using their own terminology, which makes the situation very confusing. This is further obscured by marketing departments, that try to present even the smallest advantage as revolutionary achievement. Most IGA platform are closely guarded commercial closed source software, access to the software and documentation is jealously guarded. This makes comparison of individual IGA platform a very challenging undertaking. Perhaps the best approach is to know what you need: summarize your requirements and priorities. Write down your expectations, select a product, and conduct a proof-of-concept test tuned to your specific needs. That is perhaps the only reliable way to break through the marketing veil.

Identity Management and Governance Terminology

Identity professionals, often motivated by marketing needs, like to invent new names and use them to describe the same thing. Therefore, there are many overlapping, overloaded and similar terms in use.

Identity management (IDM) is usually used to describe the low-level parts (technology), while identity governance is used to describe the high-level parts (business). Yet the boundary is very fuzzy and many IDM systems provide governance capabilities, and many governance systems provide low-level functions. Identity governance and administration (IGA) is a term supposed to describe both parts together. Governance, risk management and compliance (GRC) is a terms that was mostly used in the past to represent the high-level identity governance functionality, later known simply as identity governance. Identity security is a marketing term that roughly covers IGA functionality.

Overall, the terminology is very fluid. Vendors use their own terms, often choosing overloaded or confusing terminology. Industry analysts and consultants also adding their own terms and meanings to existing terms. Marketing terms are invented faster than the documentation can adapt, making the situation quite confusing. We have tried to compile the terminology as precisely as we could, while still making the terms understandable. We have chosen to follow established industry terminology when possible, even though many terms are overloaded and ambiguous. However, we did not want to increase the confusion by re-inventing the terminology. We are pointing out the ambiguities in the text as needed. At the very least, we are trying to use a consistent terminology in this book. When in doubt, please refer to the glossary.

Complete Identity and Access Management Solution

A comprehensive Identity and Access Management solution cannot be built by using just a single component. There is no single product or solution that can provide all the necessary features. As the requirements are so complex and often even contradictory, it is very unlikely that there ever will be any single product that can do it all.

A clever combination of several components is needed to build complete solution. The right mix of ingredients for this IAM soup will always be slightly different, as no two IAM solutions are exactly the same.

There are three basic components that are required for any practical IAM deployment:

  • Directory service or a similar identity store is the first component. This is the database that stores user account information. The accounts are stored there in a "clean" form that can be used by other applications. This database is then widely shared by applications that are capable to connect to it. This part of the solution is usually implemented in a form of LDAP server or Active Directory. However, there is one major limitation: the data model needs to be simple. Very simple. Also, the identity store needs to be properly managed.

  • Access management is a second major component of the solution. It takes care of authentication and (partially) authorization. Access management unifies authentication mechanisms. If an authentication mechanism is implemented in the access management server, then all integrated applications can easily benefit. It also provides single sign-on (SSO), centralizes access logs and so on. It is a very useful component. Of course, there are limitations. Access management system needs access to identity data. Therefore, it needs reliable, very scalable and absolutely consistent identity database as a back-end. This is usually provided by the directory service. Performance and availability are the obvious obstacles here. However, there is one more obstacle which is less obvious yet every bit as important: data quality. The data in the directory service must be correct, up-to-date and properly managed. However, that is still only a part of the picture. As most applications store some pieces of identity data locally, these data also need to be synchronized with the directory database. No access management system can do this well enough. Moreover, there is no point for access management system to do it at all. The access management system has a very different architectural responsibilities. Therefore, yet another component is needed.

  • Identity management is the last, but in many ways the most important component. This is the real brain of the solution. The identity management system maintains the data. It maintains order in the system. It makes sure the data are up-to-date and compliant with the policies. It synchronizes all the pieces of identity data that those pesky little applications always keep creating. It maintains groups, privileges, roles, organizational structures and all the other things necessary for the directory and the access management to work properly. It keeps the entire solution from falling apart. It allows system administrators and security officers to live happily, to breath easily and to keep control over the whole solution.

The following diagram shows how all these components fit together.

IAM architecture

This is truly a composite solution. There are several components that have vastly different features and characteristics. When bound together into one solution, the result is something that is much more than just a sum of its part. The components support each other. The solution cannot be complete unless all three components are in place.

However, building a complete solution may be quite expensive, and it may take a long time. You have to start somewhere. If you have resources for just one product, then choose identity management. It is a good start. It is not as expensive to deploy and integration as access management system is. IDM brings good value, even quite early in the IAM program. Going for open source product will also keep the initial investment down. Staring with IDM is usually the best choice to start the IAM program.

Risk-Based Approach To Identity Governance

Risk-based approach to identity management and governance is a very good idea. In fact, it is an excellent idea, one of the best ideas in decades. However, as with many great ideas, there are difficulties and drawbacks.

But wait a moment, what is this "risk-based" thing all about? To answer that question we have to make a quick road-trip through information security landscape.

The concept of risk comes from information security theory. Security practitioners have realized a long time ago that it is all but impossible to create a perfectly secure system. As you try to make a system more and more secure, every step is more expensive than the previous one. Every countermeasure is less efficient than the previous one, it is more intrusive, it is less flexible, harder to adapt to business needs. Eventually, the system gets to the state where it is practically useless for business, yet the system is still not completely secure.

Therefore, the security practitioners came with a concept of risk. Risk is a measure of danger that a particular asset is subjected to. An asset such as a customer database can be in risk with respect to particular threat, for example a hacker trying to steal the database to sell it to your competition. The risk tells about the probability of an asset being compromised. For example, keeping the database in a form of spreadsheet on decades old Windows machine connected to an open Internet is obviously quite a risky thing to do. The risk can be addressed using countermeasures. Countermeasures are all the things that we do to make systems more secure, ranging from operating system updates, through access-control systems up to bomb-proof doors and heavily-armed guards.

As it is not practical to completely secure a system, there is always some amount of risk that we have to accept. This is called residual risk, a risk that we are aware of, yet it is not efficient to reduce or eliminate the risk. Even though residual risk cannot be completely eliminated, there may be risk mitigation plans. For example, we may accept that there is a risk of an operating system vulnerability, and no amount of automated software updates, vulnerability database integrations and watchfulness is ever going to eliminate the risk completely. However, we can mitigate the risk by preparing plans to be executed when we are affected by a zero-day vulnerability. The plan may include disabling network access to vulnerable services as soon as we learn about the vulnerability, investigation that looks for traces of an attacker exploiting the vulnerability, emergency communication and contingency plans and so on. Risk mitigation is focused on making the impact of such an attack less painful, reducing the damage.

In an ideal situation we are completely aware of the risk, so we can implement countermeasures and prepare risk mitigation plans. However, we need to know quite a lot about the risk for the countermeasures and mitigation plans to be efficient. Risk is not just a single number, it is a multi-dimensional and often very complex concept. The amount of risk is evaluated in a risk assessment process, which is often very tedious and demanding exercise. Risk assessment process evaluates assets, to determine the value of the data and services. The process looks at threats, such as skills and motivations of an attacker. The assessment looks at vulnerabilities that attackers can use to gain access to our systems. It is also concerned with existing countermeasures, processes, policies and other details.

This means that the risk assessment process deals with big and complex data that cannot be processed by a human mind alone. The data are usually fed into risk models to determine the risk. Risk models are a set of complex mathematical formulas that transform data on assets, threats, vulnerabilities and all the other inputs into a multi-dimensional representation of risk areas. In theory, the risk model can tell us that we have a high risk in network security, especially when dealing with customer data - and we should really do something about it!

The results of risk assessment are meant to drive the implementation of countermeasures and risk mitigation plans. There are too many countermeasures and mitigation strategies to choose from, we cannot possibly implement them all. We want only the really efficient ones. We do not want to waste time and money on sophisticated encryption of data that are just copies of public information, do we? The risk assessment is supposed to tell us what is important and what is not. This is an important principle of efficient and systematic information security process: never guess, never go blind, let the risk guide you. Base your decisions on data. Implement countermeasures exactly where they are needed, where they are in a good position to address real risks.

However, information systems are not the easiest things to analyze. They never seem to stand still! The data are changing, new integration routes are added, systems are re-configured. Yet, the most annoying of all, user accounts and privileges change pretty much every day. By the time the risk assessment is done, the results are already out of date! How are we supposed to evaluate the risk, when everything around us is changing constantly?

The answer is, of course, automation. There are parts of risk assessment that cannot be automated. For example, there is no magic method to automatically assess business value of your data assets. However, some parts of risk assessment can be automated.

This is where we get back to identity management and governance. Almost all organizations are affected by insider threat, a threat posed by people that are already part of your organization. Employees, contractors, support engineers, cloud service providers - they already have access to your data. They do not need to hack anything, they do not need to overcome any countermeasures. They are already inside. All it takes to reveal your trade secrets are a simple copy and paste keystrokes. One file download is all it takes to sell your customer database. The proliferation of cloud services make such "exploits" entirely trivial. There is no technological countermeasure, no perimeter that could stop an insider to use a privilege that he or she already has.

This means that identity data heavily affect outcome of risk modeling. A system where almost all of your employees have unrestricted access to a customer database is very likely to pose much higher risk than all the network-related risks combined. A single person that has administrator-level access to almost every system in your organization is certainly a very attractive phishing target. There are high risks hidden in the identity data of almost any organization. Yet, such risk can be easily reduced by adjusting the access rights. But how to find such risks? Identity data are often complex, system-specific, distributed in many directories, cloud systems and application databases. Any identity practitioner can certainly see where this leads: identity management system, of course.

Identity management system is an ideal place for evaluation of risks related to identity and access data. Essential data are already in the database of identity management system: users, roles, role assignments, role composition, entitlements, everything is there. Entitlements can be assessed to assign a risk score to them. Then the data can be fed into a risk model, evaluating how are the entitlements combined in roles, how are teh roles assigned to users, identifying high-risk roles and users. The model is evaluated by the machine, quickly and efficiently. The efficiency opens up a whole range of possibilities, that form the essence of a risk-based approach to identity governance.

As we can evaluate a risk posed by any individual user, we can easily identify dangerous accumulation of privileges in the hands of a single user. Then we can focus on addressing this risk, analysing why have the privileges accumulated, whether they are all necessary, removing excess privileges to lower the risk. Perhaps we can consider changing business processes to divide the responsibilities among several users, lowering the risk even further. We can evaluate the risk model after each step, checking whether we have reached acceptable risk levels already.

The risk model can evaluate risk of each role. This allows detection of anomalies, such as assignment of a powerful high-risk role to an ordinary user that is supposed to be low-risk. Such role is likely to be assigned by mistake, or perhaps it was a role assigned during an emergency that was never removed. Unassignment of such role may be a quick way to reduce the risk. Smart system could suggest several types of such outliers, where privileges of an individual users stand out from the surrounding.

Once we have the concept of risk established in our system, we can use it in security policies. For example, it makes sense to require stronger password for high-risk users, or even better, automatically set up a multi-factor authentication for them. It may be desirable to re-certify privileges of high-risk users more frequently than those of low-risk users. Assignment of new role to a high-risk user may need to pass through additional approval stage. The policies can take the risk into consideration. This approach is often referred to as adaptive security.

There are even more advantages when the risk-based approach is applied also to other areas of identity and access management field. For example, it may be a good idea to require strong authentication from high-risk users, while allowing weaker authentication for low-risk users. This can be achieved in many ways, the easiest is perhaps for identity management system to propagate risk scores to access management user profiles.

Risk-based identity management and governance is indeed the right thing to do. However, the devil is in the details, and the reality is much harder than the glossy marketing brochures dare to admit. There are hidden dangers and dark corners on this route:

  • One risk assessment means nothing. Nothing at all. The results are out of date as soon as they are produced by the model. You have to do assessment continually, all the time, since now till eternity. You take the results of an assessment, plan countermeasures, apply them, only to do all the work over again. This is called "security process". It never ends.

  • Risk evaluation is almost always subjective. When evaluating risk of any individual entitlement, you will probably use subjective terms such as "low", "medium" and "high". The subjective terms are often hidden behind scores, numbers that may look like they are exact values. In reality, they are everything but exact.

    Subjective risk assessment is pretty much a standard method. Objective risk measures are sometimes tries, such as conversion of risk to a monetary value. However, such "objective" measures are often very misleading, and they are generally frowned upon by information security practitioners. There is nothing fundamentally wrong with subjective risk assessment as long as you are aware of the limitations. Perhaps the most important rule is to keep the assessment consistent and proportional. Entitlements that are assigned "low" risk level should pose approximately the same risk, and it should be significantly lower than all the entitlements marked as "medium" risk.

  • You need risk model appropriate for your organization and situation. When it comes to risk models, one size does not fit all. There are simple risk models appropriate for quick assessments in organizations with low security requirements. Then there are overly complex risk models that are designed for high-security military setting. Even if you find a model that fits your needs, you still need to fine-tune it. You cannot just buy an instant ready-to-serve-in-5-minutes risk model. Such model will never work for you.

  • Model is not reality. Model is meant to be an approximation of reality. However, how well the model approximates reality must not be taken for granted. Risk model may not for you, and you may not even realize it. Bad model will get you into a false sense of security, claiming that you are all green while in fact you can be in grave danger. Do not trust your risk model blindly. Try to validate the results, try to confront model results with reality as much as you can.

Information security is quite a strange field. You can clearly prove that your system is insecure, for example by successfully attacking your system. Yet, you can never completely prove that your system is secure. This limitation is a major source of confusion, and it also opens up opportunities for charlatans, offering their security snake oil on the market. As you cannot buy information security process, you cannot buy a ready-made risk-based approach to identity governance. You have to build it yourself. Having advanced and smart identity governance platform at your side is unquestionably a great help. However, such a platform is only a tool. Even the smartest and most expensive tool will not do all the work. It will make your work more efficient, but you still need to be the one driving it. One size does not fit all. Your organization is different to other organizations. That is what makes you unique, what gives you competitive edge, what makes you survive on the market. You cannot expect to buy a model or a process that fits your needs perfectly. You have to adapt and develop your own models, policies and processes. Having a starting point that is similar to your needs is a huge advantage. Starting from industry-specific frameworks, templates and samples will save you a lot of time. Go for these, whenever they are available. However, you will have to understand how the frameworks work, you will need to know what you are doing, as you will certainly need to adapt them to your needs.

Similarly to information security, "off the shelf" is mostly just an illusion in identity management and governance. Whatever the bold marketing statements say, you cannot just buy it and run it. No, not even in the cloud. You can buy identity governance platform as a service, but you cannot buy identity governance. You will have to learn a lot of things, you will have to dive deep into policies and models, you will have to do a lot of work yourself. Set your expectations realistically.

IAM and Security

Strictly speaking, Identity and Access Management (IAM) does not entirely fit into the information security field. The IAM goes far beyond information security. IAM can bring user comfort, reduce operation costs, speed up processes, and generally improve the efficiency of the organization. This is not what information security is concerned with. Even though IAM is not strictly part of information security, there is still a huge overlap. IAM deals with authentication, authorization, auditing, role management and governance of objects that are directly related to the information security. Therefore, IAM and information security have an intimate and very complicated relationship.

It is perhaps not too bold to say that the IAM is a pre-requisite to good information security. Especially the identity management (IDM) part is absolutely critical - even though this may not be that obvious at the first sight. Yet, the evidence speaks clearly. Security studies quite consistency rate the insider threat as one of the most severe threats for an organization. However, there is not much that the technical security countermeasures can do about the insider threat. The employee, contractor, partner, serviceman - they all get the access to your systems easily and legally. They can legally pass through even the strongest encryption and authentication because they have got the keys. Firewalls and VPNs will not stop them, because those people are meant to pass through them to do their jobs.

Vulnerabilities are there, obviously. With the population of thousands of users there is a good change that there is also an attacker. Maybe one particular engineer was fired yesterday. Yet, he still has VPN access and administration rights to the servers. As he might not be entirely happy about the way how he has been treated, the chances are he might be quite inclined to make your life a bit harder. Maybe leaking some company records would do the trick. Now we have a motivated attacker who will not be stopped by any countermeasures, and who can easily access the assets. Any security officer can predict the result without a need for a comprehensive risk analysis.

Information security has no clear answers to the insider threat. This is no easy issue to solve, as there is obviously a major security trade-off. The business people wants to access the assets easily to do their jobs, to keep the wheels of an organization turning. However, security team needs to protect the assets from the very users that are accessing them. There is no silver bullet to solve this issue. However, there is a couple of things that can be done to improve the situation:

  • Record who has access to what. Each user has accounts in many applications through the enterprise. Keep track which account belongs to which user. It is very difficult to do that manually. Yet, even the worst IDM system can do that.

  • Remove access quickly. If there is a security incident, then the access rights need to be removed in order of seconds. If an employee is fired, then the accounts have to be disabled in order of minutes. It is not a problem for a system administrator to do that manually. However, will the administrator be available during a security incident late in the night? Would you synchronize layoffs with the work time of system administrators? Wouldn’t system administrators forget to stop all the processes and background jobs that the user might have left behind? IDM system can do that easily. Security team can simply disable all the accounts by using IDM system. Single click is all that is needed.

  • Enforce policies. Keep track about the privileges that were assigned to users. This usually means managing assignment of roles (and other entitlements) to users. Make sure that the assignment of sensitive roles is approved before user gets the privileges. Compare the policies and the reality. System administrators that create accounts and assign entitlements are not robots. Mistakes can happen. Make sure the mistakes are discovered and remediated. This is the natural best practice. However, it is almost impossible to do manually. Yet even an average IDM system can do that without any problems.

  • Remove unnecessary roles. Role assignments and entitlements tend to accumulate over time. Long-time employees often have access to almost any asset simply because they needed the data at some point in their career. The access to the asset was not removed since. This is a huge security risk. It can be mitigated by inventing a paper-based process to review the entitlements. However, such process is very slow, costly, error-prone, and it has to be repeated in regular intervals. Yet, advanced IDM systems already support automation of this access certification process.

  • Maintain order. If you closely follow the principle of least privilege, then you have probably realized that you have more roles that you have users. Roles are abstract concepts, and they are constantly evolving. Even experienced security professionals can easily get lost in the role hierarchies and structures. The ordinary end users often have absolutely no idea what roles they need. Yet, it is not that hard to sort the roles to categories if you maintain them in a good IDM system. This creates a role catalog that is much easier to understand, use and maintain.

  • Keep track. Keep an audit record about any privilege change. This means keeping track of all new accounts, account modifications, deletions, user and account renames, role assignments and unassignments, approvals, role definition changes, policy changes and so on. This is a huge task to do manually. It is almost impossible to avoid mistakes. Yet, a machine can do that easily and reliably.

  • Scan for vulnerabilities. Mistakes happen. System administrators often create testing accounts for troubleshooting purposes. There is an old tradition to set trivial passwords to such accounts. These accounts are not always cleaned up after the troubleshooting is done. Also, there may be worse mistakes. System administrators may assign privileges to a wrong user. Help desk may enable account that should be permanently disabled. Therefore, all the applications have to be permanently scanned for accounts that should not be there and for entitlements that should not be assigned. This is simply too much work to be done manually. It is not really feasible unless a machine can scan all the system automatically. This is called reconciliation, and it is one of the basic functionalities of any decent IDM system.

Theoretically all of these things can be done manually. However, it is not feasible in practice. The reality is that information security seriously suffers - unless there is and IDM system that brings automation and visibility. Good information security without an IDM system is hardly possible.

Risk-Based Approach To Identity Governance

Risk-based approach to identity management and governance is a very good idea. In fact, it is an excellent idea, one of the best ideas in decades. However, as with many great ideas, there are difficulties and drawbacks.

But wait a moment, what is this "risk-based" thing all about? To answer that question we have to make a quick road-trip through information security landscape.

The concept of risk comes from information security theory. Security practitioners have realized a long time ago that it is all but impossible to create a perfectly secure system. As you try to make a system more and more secure, every step is more expensive than the previous one. Every countermeasure is less efficient than the previous one, it is more intrusive, it is less flexible, harder to adapt to business needs. Eventually, the system gets to the state where it is practically useless for business, yet the system is still not completely secure.

Therefore, the security practitioners came with a concept of risk. Risk is a measure of danger that a particular asset is subjected to. An asset such as a customer database can be in risk with respect to particular threat, for example a hacker trying to steal the database to sell it to your competition. The risk tells about the probability of an asset being compromised. For example, keeping the database in a form of spreadsheet on decades old Windows machine connected to an open Internet is obviously quite a risky thing to do. The risk can be addressed using countermeasures. Countermeasures are all the things that we do to make systems more secure, ranging from operating system updates, through access-control systems up to bomb-proof doors and heavily-armed guards.

As it is not practical to completely secure a system, there is always some amount of risk that we have to accept. This is called residual risk, a risk that we are aware of, yet it is not efficient to reduce or eliminate the risk. Even though residual risk cannot be completely eliminated, there may be risk mitigation plans. For example, we may accept that there is a risk of an operating system vulnerability, and no amount of automated software updates, vulnerability database integrations and watchfulness is ever going to eliminate the risk completely. However, we can mitigate the risk by preparing plans to be executed when we are affected by a zero-day vulnerability. The plan may include disabling network access to vulnerable services as soon as we learn about the vulnerability, investigation that looks for traces of an attacker exploiting the vulnerability, emergency communication and contingency plans and so on. Risk mitigation is focused on making the impact of such an attack less painful, reducing the damage.

In an ideal situation we are completely aware of the risk, so we can implement countermeasures and prepare risk mitigation plans. However, we need to know quite a lot about the risk for the countermeasures and mitigation plans to be efficient. Risk is not just a single number, it is a multi-dimensional and often very complex concept. The amount of risk is evaluated in a risk assessment process, which is often very tedious and demanding exercise. Risk assessment process evaluates assets, to determine the value of the data and services. The process looks at threats, such as skills and motivations of an attacker. The assessment looks at vulnerabilities that attackers can use to gain access to our systems. It is also concerned with existing countermeasures, processes, policies and other details.

This means that the risk assessment process deals with big and complex data that cannot be processed by a human mind alone. The data are usually fed into risk models to determine the risk. Risk models are a set of complex mathematical formulas that transform data on assets, threats, vulnerabilities and all the other inputs into a multi-dimensional representation of risk areas. In theory, the risk model can tell us that we have a high risk in network security, especially when dealing with customer data - and we should really do something about it!

The results of risk assessment are meant to drive the implementation of countermeasures and risk mitigation plans. There are too many countermeasures and mitigation strategies to choose from, we cannot possibly implement them all. We want only the really efficient ones. We do not want to waste time and money on sophisticated encryption of data that are just copies of public information, do we? The risk assessment is supposed to tell us what is important and what is not. This is an important principle of efficient and systematic information security process: never guess, never go blind, let the risk guide you. Base your decisions on data. Implement countermeasures exactly where they are needed, where they are in a good position to address real risks.

However, information systems are not the easiest things to analyze. They never seem to stand still! The data are changing, new integration routes are added, systems are re-configured. Yet, the most annoying of all, user accounts and privileges change pretty much every day. By the time the risk assessment is done, the results are already out of date! How are we supposed to evaluate the risk, when everything around us is changing constantly?

The answer is, of course, automation. There are parts of risk assessment that cannot be automated. For example, there is no magic method to automatically assess business value of your data assets. However, some parts of risk assessment can be automated.

This is where we get back to identity management and governance. Almost all organizations are affected by insider threat, a threat posed by people that are already part of your organization. Employees, contractors, support engineers, cloud service providers - they already have access to your data. They do not need to hack anything, they do not need to overcome any countermeasures. They are already inside. All it takes to reveal your trade secrets are a simple copy and paste keystrokes. One file download is all it takes to sell your customer database. The proliferation of cloud services make such "exploits" entirely trivial. There is no technological countermeasure, no perimeter that could stop an insider to use a privilege that he or she already has.

This means that identity data heavily affect outcome of risk modeling. A system where almost all of your employees have unrestricted access to a customer database is very likely to pose much higher risk than all the network-related risks combined. A single person that has administrator-level access to almost every system in your organization is certainly a very attractive phishing target. There are high risks hidden in the identity data of almost any organization. Yet, such risk can be easily reduced by adjusting the access rights. But how to find such risks? Identity data are often complex, system-specific, distributed in many directories, cloud systems and application databases. Any identity practitioner can certainly see where this leads: identity management system, of course.

Identity management system is an ideal place for evaluation of risks related to identity and access data. Essential data are already in the database of identity management system: users, roles, role assignments, role composition, entitlements, everything is there. Entitlements can be assessed to assign a risk score to them. Then the data can be fed into a risk model, evaluating how are the entitlements combined in roles, how are teh roles assigned to users, identifying high-risk roles and users. The model is evaluated by the machine, quickly and efficiently. The efficiency opens up a whole range of possibilities, that form the essence of a risk-based approach to identity governance.

As we can evaluate a risk posed by any individual user, we can easily identify dangerous accumulation of privileges in the hands of a single user. Then we can focus on addressing this risk, analysing why have the privileges accumulated, whether they are all necessary, removing excess privileges to lower the risk. Perhaps we can consider changing business processes to divide the responsibilities among several users, lowering the risk even further. We can evaluate the risk model after each step, checking whether we have reached acceptable risk levels already.

The risk model can evaluate risk of each role. This allows detection of anomalies, such as assignment of a powerful high-risk role to an ordinary user that is supposed to be low-risk. Such role is likely to be assigned by mistake, or perhaps it was a role assigned during an emergency that was never removed. Unassignment of such role may be a quick way to reduce the risk. Smart system could suggest several types of such outliers, where privileges of an individual users stand out from the surrounding.

Once we have the concept of risk established in our system, we can use it in security policies. For example, it makes sense to require stronger password for high-risk users, or even better, automatically set up a multi-factor authentication for them. It may be desirable to re-certify privileges of high-risk users more frequently than those of low-risk users. Assignment of new role to a high-risk user may need to pass through additional approval stage. The policies can take the risk into consideration. This approach is often referred to as adaptive security.

There are even more advantages when the risk-based approach is applied also to other areas of identity and access management field. For example, it may be a good idea to require strong authentication from high-risk users, while allowing weaker authentication for low-risk users. This can be achieved in many ways, the easiest is perhaps for identity management system to propagate risk scores to access management user profiles.

Risk-based identity management and governance is indeed the right thing to do. However, the devil is in the details, and the reality is much harder than the glossy marketing brochures dare to admit. There are hidden dangers and dark corners on this route:

  • One risk assessment means nothing. Nothing at all. The results are out of date as soon as they are produced by the model. You have to do assessment continually, all the time, since now till eternity. You take the results of an assessment, plan countermeasures, apply them, only to do all the work over again. This is called "security process". It never ends.

  • Risk evaluation is almost always subjective. When evaluating risk of any individual entitlement, you will probably use subjective terms such as "low", "medium" and "high". The subjective terms are often hidden behind scores, numbers that may look like they are exact values. In reality, they are everything but exact.

    Subjective risk assessment is pretty much a standard method. Objective risk measures are sometimes tries, such as conversion of risk to a monetary value. However, such "objective" measures are often very misleading, and they are generally frowned upon by information security practitioners. There is nothing fundamentally wrong with subjective risk assessment as long as you are aware of the limitations. Perhaps the most important rule is to keep the assessment consistent and proportional. Entitlements that are assigned "low" risk level should pose approximately the same risk, and it should be significantly lower than all the entitlements marked as "medium" risk.

  • You need risk model appropriate for your organization and situation. When it comes to risk models, one size does not fit all. There are simple risk models appropriate for quick assessments in organizations with low security requirements. Then there are overly complex risk models that are designed for high-security military setting. Even if you find a model that fits your needs, you still need to fine-tune it. You cannot just buy an instant ready-to-serve-in-5-minutes risk model. Such model will never work for you.

  • Model is not reality. Model is meant to be an approximation of reality. However, how well the model approximates reality must not be taken for granted. Risk model may not for you, and you may not even realize it. Bad model will get you into a false sense of security, claiming that you are all green while in fact you can be in grave danger. Do not trust your risk model blindly. Try to validate the results, try to confront model results with reality as much as you can.

Information security is quite a strange field. You can clearly prove that your system is insecure, for example by successfully attacking your system. Yet, you can never completely prove that your system is secure. This limitation is a major source of confusion, and it also opens up opportunities for charlatans, offering their security snake oil on the market. As you cannot buy information security process, you cannot buy a ready-made risk-based approach to identity governance. You have to build it yourself. Having advanced and smart identity governance platform at your side is unquestionably a great help. However, such a platform is only a tool. Even the smartest and most expensive tool will not do all the work. It will make your work more efficient, but you still need to be the one driving it. One size does not fit all. Your organization is different to other organizations. That is what makes you unique, what gives you competitive edge, what makes you survive on the market. You cannot expect to buy a model or a process that fits your needs perfectly. You have to adapt and develop your own models, policies and processes. Having a starting point that is similar to your needs is a huge advantage. Starting from industry-specific frameworks, templates and samples will save you a lot of time. Go for these, whenever they are available. However, you will have to understand how the frameworks work, you will need to know what you are doing, as you will certainly need to adapt them to your needs.

Similarly to information security, "off the shelf" is mostly just an illusion in identity management and governance. Whatever the bold marketing statements say, you cannot just buy it and run it. No, not even in the cloud. You can buy identity governance platform as a service, but you cannot buy identity governance. You will have to learn a lot of things, you will have to dive deep into policies and models, you will have to do a lot of work yourself. Set your expectations realistically.

Zero-Trust Approach

Zero-trust is an approach to design network and application systems. The basic idea is that a system should not implicitly trust any other system, not even systems located on a "secure" corporate network. Simply speaking, zero-trust approach is mostly about removing security perimeter.

For many decades, corporate networks were designed using hard exterior, soft interior approach. Corporate network was protected from the Internet by an army of specialized security systems and techniques, such as firewalls, de-militarized zones, network traffic analysers, intrusion detection systems, network anti-virus scanners and everything else a booming network security market could provide. While the castle gates were heavily protected, the interior of corporate network was very soft. Originally, there were no security measures inside corporate networks at all. Anyone that got inside could connect to any system. Of course, basic authentication and authorization mechanisms were usually there, but the network was not segmented, and the traffic was usually not even protected by basic encryption. This approach created a security perimeter around corporate network. If you want to keep the data secure, you have to make sure nobody gets into the network.

Of course, this approach does not really make much sense in the Internet era. There is no such thing as a network perimeter, not since the invention of WiFi, mobile data and USB keys, anyway. It is ridiculously easy to breach the perimeter by connecting a WiFi device to the corporate network. However, even that was usually not needed, as the data can be copied to USB keys, or easily moved outside the perimeter by using virtual private network (VPN) access. Corporate security professionals tried to address such threats by strictly controlling user’s devices, such as disabling USB ports, disabling access to other networks while the device was part of VPN, and so on. However, none of these countermeasures were really effective, and they were usually very intrusive and inconvenient for the users. Advent of cloud services and mobile devices was the last drop. The traditional corporate information security approach is dead. Even the most traditional security practitioners had to admit what was already obvious: there is no perimeter.

The old approach was replaced by a new one: zero trust. An application should not trust any other application, not even an application on the same corporate network. Network perimeter was replaced by mutual authentication and network traffic protection. Each application must authenticate the other end of the connection, whether it is talking to the party it is supposed to talk to. Network traffic always needs to be encrypted and authenticated (signed), always assuming that it can be passing through an insecure network. Simply speaking, we treat corporate network in exactly the same way as we are treating public Internet. The soft interior turned into hard interior, and the perimeter was no longer needed.

The concept of zero trust is not new at all. It was here pretty much since the very beginnings of information security. Security practitioners are trained not to trust anything or anyone, set up policies, require authentication, deny access by default, encrypt all network traffic, minimize privileges, manage risks, and so on. Therefore, "zero trust" approach is essentially just a thorough application of proper information security principles. This approach is here for decades, it just had different names: defence in depth, perimeterless security, network hardening and so on.

Even though zero trust approach is not new, it dramatically gained on importance in the era of cloud services and remote access. Many functions provided by traditional corporate applications are provided by cloud services now. Such applications need data to work, therefore, the very use of "as a service" applications is by itself a breach of the perimeter. Cloud applications need to work together with on-premise applications. Traditional enterprise integration patterns based on soft interior do not work in this brave new world anymore. All of that is driving the zero trust concepts.

Of course, "zero trust" is more a wish rather than a strict rule. A system that trusts nothing will not be able to work at all. There is always some amount of trust involved, even in zero trust approach. The system must trust that the root keys and certificates are authentic. There is an implicit trust that the developers have not introduced a back door into system code. Zero trust approach should perhaps be called "minimal trust approach". However, "zero trust" is much more attractive in glossy magazines and presentations. Whatever is the approach called, the basic principle remains the same: aggressively minimize implicit trust in the system.

Identity permeates everything, therefore the zero trust approach has an impact on identity and access management as well. Impact on access management technologies is perhaps quite obvious. Access management deals mostly with authentication. The applications need to authenticate to each other in this new zero trust world. Therefore, the access management systems need to handle authentication of non-person identities, such as applications and devices. Many scenarios do not significantly deviate from the usual authentication, perhaps the only difference is that the authentication needs to be completely non-interactive. However, there are also more complex authentication scenarios, such as an application authentication on behalf of a user. Traditional authentication methods (such as password-based authentication) are obviously ill-prepared for such scenarios. Therefore, the zero trust approach is often combined with introduction of new authentication mechanisms.

Impact of zero trust approach on identity management systems is much more subtle. Zero trust approach requires mutual authentication of communication parties. Which means that the identity management system needs to manage non-person identities such as applications and devices. This requirement is not new, therefore most well-maintained identity management products are more than capable in this aspect. The problematic part is usually the connection between the identities, the relationship. If two applications have to communicate, both of them need to know about the other one. API keys, pre-shared secrets, certificates and other cryptographic material needs to be set up before first communication can happen. The credentials need to be updated, keys must be changed periodically, certificates need to be renewed. Such application credentials are much more important than passwords ever were, as application credentials are quite literally the keys to the kingdom. An attacker that gets access to the application keys has access to all the data in cloud applications, which very likely means your payroll data, customer database, internal documents, almost everything that is important to you. Experienced information security professionals know, that it is not encryption that is the most difficult part of cryptography. The key management is. Similarly, it is not the authentication that is the most difficult problem in zero trust approach. Identity and credential management is much harder. Today, application identities and credentials are usually configured manually by system administrators. However, such approach does not scale. The whole idea of "as a service" applications is to make information systems more flexible, more dynamic, and especially less demanding when it comes to system administration. Manual management of application identities goes well against that idea.

What can an identity management system do for zero trust approach?

Firstly, it can do the same thing it normally does. It can manage user identities. In zero trust mode, users have to authenticate everywhere, they have to be authorized everywhere, in every application or service. While authentication can usually be handled by access management or single sign-on systems, authorization is much harder. Identity management system can do it, it can manage entitlements and privileges, it can de-provision unnecessary accounts and access rights. This part is relatively easy, it is what identity management systems do for decades.

Secondly, identity management system can manage access of one application to other application. This means management of application accounts, management of application credentials, privileges, de-provisioning the accounts when application is decommissioned. This may sound simple, but it is all but simple in reality. The most basic requirement is an application inventory, an authoritative list of applications in an organization. However, many organizations do not have that at all. Those organizations that have it, usually have in a form of informal spreadsheet that is not entirely machine-readable. How is identity management system supposed to even start management of application identities, when there is no authoritative source? Therefore, the effort should start with building such a source, which may be a manually-maintained information in the identity management system. Then, application accounts and entitlements (groups) need to be associated with the application. Applications much have owners, which can be maintained in IDM system. As application accounts and entitlements are associated with an application, they can be automatically de-provissioned when application is decommissioned. However, perhaps the most challenging part is credential management. The credentials should be changed periodically. However, this change has to be synchronized on both sides of the communication channel (both client and server part), otherwise there will be an outage. This is difficult to automate, even with the state-of-the-art IDM system.

Overall, current identity management and governance platforms are not ready for full management of applications, application accounts and entitlements. Only a partial functionality is usually available. Even though the concepts of zero trust approach are quite old, they were never applied systematically at scale. Therefore, there was not a sufficient demand to implement required features in identity management systems. The future will tell whether current wave of zero trust hype brings such demand. However, pretty much all IDM systems much evolve and develop new features. Therefore, it is crucial to choose and IDM platform that can evolve.

Building Identity and Access Management Solution

There is no single identity and access management solution that would suit everybody. Every deployment has specific needs and characteristics. Deployment in a big bank will probably focus on governance, role management and security. Deployment in small enterprise will focus on cost efficiency. Cloud provider will focus on scalability, user experience and simplicity. Simply speaking, one size does not fit all. Almost all IAM solutions use the same principal components. However, product choice, solution topology and configuration will significantly vary. Do not expect that you download a product, install it and that it will solve all your problems. It won’t. Customization is the key.

We consider identity management to be heart and brain of any IAM solution. This is one of the reasons why we have started midPoint project. The rest of this book will focus almost exclusively on identity management and the use of midPoint as the IDM component. This is the place where theory ends and practice begins.