A deep dive into Elasticsearch authentication realms

This is a technical deep dive into the authentication process — a necessary first step before addressing the authorization decisions that are at the core of Elasticsearch security. The following will be a very detailed explanation of the inner workings of a key part of the authentication process: realms. If you'd prefer to start with a broader view of authentication (and authorization) in Elasticsearch, you may want to check out Demystifying authentication and authorization in Elasticsearch.

This post is for those curious about how things work behind the scenes in Elasticsearch. Curious readers are encouraged to browse and even contribute code.

Overview of authentication

The authentication process is centered around realms that validate credentials of clients. There is a slew of options for realms because each one works for users administered in a different way. The means of credential validation distinguish realms by type. The authentication process happens for each request on the coordinating node, which is the Elasticsearch node that receives the client request. Upon successful authentication, a username is assigned to the security context of the request. All subsequent authorization decisions for the request are performed under this security context.

Strictly speaking, authentication is considered complete when a request is assigned a username (aka principal, since username implies an interactive human) and it does not cover assigning privileges. Privileges are assigned during authorization. However, role names are part of the user metadata, and some realms do assign role names as part of authentication — conceptually crossing the boundary into authorization a little. Nevertheless, in this post we’ll focus on the process of assigning the username to the security context of the request. If you want to learn more about assigning roles to usernames, take a look at the mapping roles reference docs.

This post does not cover authentication by means of tokens or API keys. These forms of credentials are intrinsic to Elasticsearch — they are not managed by an outside service, and are generally used when other services that need to communicate with Elasticsearch don’t want to deal with the credentials of a user from a realm.

When a request requires coordination among multiple nodes, authentication is performed only on the coordinating node that handles the original request from the client. The participating data nodes trust the authentication of the coordinating node because there is a mutually authenticated TLS connection established between the nodes. The security context of a successful authentication is hence preserved across every thread on every node as it handles that particular authenticated request.

Types of realms supported in Elasticsearch

It’s also possible to implement and plug in a custom realm, which allows you to code any form of credential validation not natively supported, i.e. not in the previous list.

Kinds of credentials

  • Reserved, Native, File, LDAP, and Active Directory realms extract the client’s credentials from the HTTP request’s Authorization header. More specifically, the client must employ the HTTP basic access authentication method.
  • PKI relies on the mutual TLS authentication on the node’s network layer. The credentials are the client’s private key and certificate chain.
  • Kerberos also extracts the credentials from the HTTP request’s Authorization header. The credentials are not the encoded password, but an encrypted identity assertion from an external authentication server. Kerberos is a form of SSO.
  • SAML and OpenID Connect are also SSO realm types, so they consume identity assertions, signed and optionally encrypted from an Identity Provider (aka authentication server). Unlike Kerberos, the assertions are not passed directly from the client in an HTTP header, but are passed by a dedicated API called through a facilitating proxy, like Kibana, on behalf of the client. This is done because the flow to obtain these assertions assume browser interaction on the client’s part (HTML and 302 redirects).

Realm credential validation

  • Reserved is an “internal” type of realm used to authenticate a predefined, fixed set of built-in users — assigned to other Stack components, such as Beats and Kibana. It is “always” enabled and works similarly to the Native realm. You can change the password of built-in users but not their roles. These users, as well as the native ones, can be individually disabled so that you don’t have to change the password to a random value.
  • Native stores usernames and their corresponding password hashes in the .security-7 index. The authentication validates that, for a given username, the hashed password from the request header is identical with the one stored in the index.
  • File also works like Native, but the hashes from which it is compared against are stored inside a file (ES_PATH_CONF/users) on the local node.
  • LDAP and Active Directory validate the user’s password, which is extracted from the usual Authorization: Basic HTTP header, by contacting an external directory service. The network protocol (LDAP) to communicate with the service is session-based and has a bind operation that authenticates the session. These realms try to bind with the password of the client.
  • PKI is the simplest in terms of validation, because the credentials are validated by the network layer as part of the mutual TLS (between the client and the coordinating node). However, the realm has options to further restrict the set of such authenticated users by validating the client’s cert chain against different trusted roots and by applying a regex on the end entity’s Subject DN.
  • Kerberos, SAML, and OpenID Connect all work in a similar fashion. Clients are not authenticated directly by Elasticsearch or Kibana, but by an external Authentication Service like an Identity Provider. These realms then authenticate the client by validating the assertion. They are configured with a secret key (keytab file) or a public key (metadata file), which is obtained from the external service during an initial Service Provider registration phase. The process for each differs, but the essence is that the ES admin must go to the Identity Provider and ask it to authenticate users for a particular Kibana/ES deployment, and as a result it obtains a piece of configuration that must be used when configuring these SSO realms.

Configuration of authentication realms

The realms are configured in the elasticsearch.yml configuration file under the xpack.security.authc.realms.<type>.<name> settings namespace, where <type> is one of the following: native, file, ldap, active_directory, pki, kerberos, saml, oidc. Configurations are local to each node, and currently any configuration change requires a node restart. <name> is a unique identifier, as there can be multiple realms with the same type (native, file, and kerberos do not allow for several realms with that type).

All realms have a configuration key named order that allows you to define what is commonly referred to as a realm chain. When realms, be it of the same type or not, consume the same type of credentials (for example, they extract credentials from HTTP’s Authorization: Basic header), the credentials are attempted for validation by consulting realms in the increasing order index (i.e. realm with order “1” is attempted before realm with order “2”). Because of this, it is not recommended to have multiple realms configured where clients use the same authentication method and are finally authenticated by realms closer to the tail of the realm chain.

The simplest realm configuration is to not configure any realm at all. In this case, the File and Native (and Reserved) are enabled implicitly. If any realm is configured, then File and Native are no longer enabled.

Caching

The Native, File, LDAP, Active Directory, Kerberos, and PKI realms cache positive authentication results. The cache for each realm can be individually configured. For example, an LDAP realm will not reach to the directory service for each Elasticsearch request (because each request requires authentication) and will instead cache successful authentication results. Negative authentication results are not cached because that would expose Elasticsearch to password change propagation issues.

Moreover, because LDAP and Active Directory realms need to connect to external systems, they create some surface for a short burst of requests before the first request returns to populate the cache. For this matter, there is some more finesse required in the caching logic, which forestalls authentication requests for the same principal until the first in-flight request has returned.

When there are multiple realms configured that consume the same type of credentials, there is always the risk of having clients frequently authenticating using realms at the tail of the realm chain. In this case, the client would be denied authentication by all the realms before the one that authenticates successfully. To counter the performance impact of this iteration, there is a smart trick to remember the last realm that authenticated every principal and move that realm to the head of the chain when an authentication request for that same principal comes in.

Tips and tricks

  • Because realms are configured only from the elasticsearch.yml file, there is no privilege high enough in Elasticsearch that would allow altering this configuration. Hence, changing authentication methods requires system administrator privileges on the host running the node and not Elasticsearch privileges.
  • File realm may be looking antiquated, but has a nice practical use case. In the event of a borked .security-7 index, or that the external authentication service of the SSO realms becomes unavailable, all users, including the most privileged ones, will be denied access. In this case, the File realm comes to the rescue! The system administrator is able to enable and add users to this realm so that Elasticsearch admins are permitted access no matter the cluster state.
  • It is supported and sometimes useful to not have homogeneous realm configuration across coordinating nodes. For example, data-only nodes might only enable the File realm without any users added. Users could later be added during a manual intervention from the system admin on the respective data node; the users-file is monitored for changes so a node restart is not necessary!
  • For realms that do not require browser interaction on the client’s part (all realms except SAML and OpenID Connect), there is the option of a debug API — namely _authenticate, which conducts the authentication process as usual and returns the User metadata as it would be normally attached to the security context of threads working on the request.
  • Realms such as Native, File, and possibly (depending on configuration) LDAP and Active Directory can be used in the authorization context as well. These realms have the option to internally retrieve user metadata without requiring the user’s secret password. They are used to implement the run_as functionality (submitting requests on behalf of other users) and the authorization_realm use case, where one realm does authentication but another one assigns the roles.

Conclusion

While the details in full can certainly be intimidating, the fact is that in any real-world deployment, very few authentication realm types are used. It all depends on what user management method is favored by your organization. Elasticsearch security alone gives you the flexibility here. Moreover, these details only become relevant when something is not quite working as expected. When that happens, and you don’t wish to deal with it alone, there’s a crowd over at discuss.elastic.co that can help you sort it out. And if you wish to get a practical knowledge of all of this, be sure to check out the Fundamentals of Securing Elasticsearch training.

To get a better perspective on authentication within Elasticsearch, and more of its security features, you can read the following posts:

Or for more detail on specific authentication methods from an administrator point of view, you can check out these posts: