Red Hat's single sign-on (SSO) technology is an identity and access management tool included in the Red Hat Middleware Core Services Collection that's based on the well-known Keycloak open source project. As with other Red Hat products, users have to acquire subscriptions, which are priced according to the number of cores or vCPU used to deploy the product.
This presents an interesting problem for pre-sales engineers like me. To help my customers acquire the correct number of subscriptions, I need to sketch the target architecture and count how many cores they need. This would not be a problem if off-the-shelf performance benchmarks were available; however, they are not.
This article will help colleagues and customers estimate their SSO projects more precisely. We will examine the performance benchmarks I ran, how I designed them, the results I gathered, and how I drew conclusions to size my SSO project.
Performance benchmarks: Framing the problem
Performance benchmarking is a broad topic, and if you don't correctly frame the problem, it is easy to answer a question that has not been asked. So, for my situation, I wanted to answer the following questions:
- Which architectural choices have the most impact on the number of cores used?
- Which part of the user’s session has the most impact on the number of cores used? Opening the session? Renewing the token? Validating the token?
- How many transactions per second (TPS) can we expect per core from typical server-grade hardware?
Note: Between a low-end server CPU and a high-end server CPU, there can be a significant gap in terms of single-thread performance. Therefore, I am not interested in precise figures, but rather the order of magnitude (e.g., 1, 10, 100, or 1,000 TPS?).
Planning the performance assessment
In the Keycloak repository, there is a test suite that assesses Keycloak performance. After careful study, I decided not to use it for two reasons:
- It does not answer the questions I listed in the previous section.
- The test suite is tightly coupled with the Keycloak development environment, so it would be difficult to reuse on customer sites if needed.
I divided my approach into four main steps:
- Setting up SSO and the underlying services (database, load balancer, etc.).
- Filling the SSO database with users, clients, and realms.
- Generating load on the SSO server.
- Collecting performance data.
Every customer is different, and sometimes various tools and techniques might be required on customer sites. With that in mind, I designed those four steps to be loosely coupled, so you can adjust each step to use a different tool or technique.
Whenever possible, I reuse the Keycloak Realm file as a pivot format. It is used by the script that loads the database and by the load testing tool that generates load on the SSO server.
To set up SSO and the underlying services, I chose to use Ansible playbooks that deploy components as Podman containers. They are easy for new team members to use and understand; plus, they are widely used on customer sites.
I created a dedicated tool named kci to load the database with users, clients, and realms.
To generate the load on the SSO server, I used K6, a novel performance testing tool written in Go that uses plain JavaScript for the test definition. (Have a look at k6.io if you aren't familiar with it.)
The test results are collected by Prometheus and presented through Grafana, as shown in Figure 1. For a primer on K6, Prometheus, and Grafana, I recommend reading this article.
Designing the benchmark scenarios
Scenarios are a key part of the performance benchmark. Carefully chosen scenarios will help to answer the questions we established earlier. One should devise scenarios as scientific experiments:
- Choose a control that sets a baseline and ensures that performance remains constant at any point in time.
- Craft experiments by changing one (and only one) parameter of the control experiment at a time. The experiment will reflect this parameter's effect on performance.
For my control experiment, I chose the following configuration as the baseline:
- Two SSO servers, each one having a dedicated single physical core.
- The SSO servers are backed by a PostgreSQL instance.
- A Traefik reverse proxy is set in front of the SSO servers to spread the load.
- The database is loaded with 5,000 users and 500 clients spread amongst 5 realms.
- No specific performance tuning is applied to any of those components.
And then from this baseline, I devised the following scenarios:
- Offline tokens: Same as baseline, but offline tokens are requested instead of regular tokens.
- MariaDB: Same as baseline, but with MariaDB instead of PostgreSQL.
- One node: Same as baseline, but with only one SSO instance having two physical cores.
- Size S: Same as baseline, but with less data in the database (100 users and 10 clients in 1 realm).
- Size L: Same as baseline, but with more data in the database (100,000 users and 10,000 clients spread in 10 realms).
- PBKDF2 with 1 iteration: Same as baseline, but with the PBKDF2 configured with 1 iteration instead of 27,500.
- LDAP: Same as baseline, but with users loaded in an OpenLDAP instance instead of the SSO database.
In the baseline and the scenarios just described, I chose to collect the following metrics:
- The user opens its SSO session: How many TPS?
- The user refreshes its access token: How many TPS?
- The user token is introspected using the tokeninfo endpoint: How many TPS?
- The user token is introspected using the userinfo endpoint: How many TPS?
I chose to focus only on the number of transactions per second because it is an objective measure (the maximum you can get). Latency figures are sometimes discussed on customer sites, but there is not much we can do about it. Latency refers to the minimum number of CPU cycles required to serve the request, and it can only increase when bottlenecks (such as CPU contention) start to appear. Said differently: There is a typical latency, and that latency starts to skyrocket when a tipping point is passed. As long as you do not cross that point, there is nothing interesting happening.
I ran the performance benchmarks on a bare-metal server: an HP MicroServer gen8, with a Xeon E3-1240v2 CPU and 16GB of RAM. Only two physical cores of the Xeon CPU are dedicated to SSO servers. The rest has been allocated to the load balancer, the database, and the operating system.
Note on the PBKDF2 function
In the next section, you will see a big increase in the throughput depending on where the user passwords are stored. Let's take a closer look at the Password-Based Key Derivation Function 2 (PBKDF2) function.
By default, Red Hat's single sign-on tool stores the user passwords in its internal database and hashes those passwords using the PBKDF2 function. The purpose of this function is to be CPU intensive to slow down brute force attacks to a point where they become too expensive or too long to be practical. One can adjust the strength of this protection by configuring the number of iterations.
SSO performs 27,500 PBKDF2 iterations by default. Wikipedia tells us more about what is a safe choice for the number of iterations.
Note: When the standard was written in the year 2000, the recommended minimum number of iterations was 1,000, but the parameter was designed to increase over time to align with CPU speeds. A Kerberos standard in 2005 recommended 4,096 iterations; Apple reportedly used 2,000 for iOS 3, and 10,000 for iOS 4. In 2011, LastPass used 5,000 iterations for JavaScript clients and 100,000 iterations for server-side hashing.
This means you cannot store passwords in a secure way and at the same time exhibit a high number of TPS per physical core during the user session openings. By definition.
However, you can configure SSO to use passwords from another repository (your Active Directory, OpenLDAP, or Red Hat Directory Server, for instance) and rely on the security mechanisms of those repositories. That would be a way to have the best of both worlds.
Results
Based on the results, I was able to draw the following conclusions:
- The key dimensions of an SSO project are the number of user session openings per second and where the user passwords are stored.
- SSO can sustain around 75 TPS per physical core if the user passwords are stored in a third-party system (an LDAP directory, for instance) or if the PBKDF2 function is configured with one iteration.
- Otherwise, SSO sustains slightly less than 10 TPS per physical core.
- Refreshing tokens is less costly: SSO can sustain around 200 TPS per physical core.
- Introspecting a token is pretty cheap. SSO can sustain around 1,400–1,700 TPS per physical core; 1,400 TPS using the tokeninfo endpoint, and 1,700 TPS using the userinfo endpoint.
- The choice of the database has no significant impact on performance.
- Using offline tokens instead of regular tokens has a slight impact on performance (a 10% penalty).
- When high availability is not required, the one node setup shows a 20% increase in the number of TPS per physical core.
Note: One physical core = two threads = two vCPU.
See the repository containing the complete result set.
Conclusion
The SSO performance benchmarks presented in this article are by no means a definitive answer to this topic. It should instead be considered as an initial work to help the community, Red Hatters, and our customers better size their single sign-on projects.
More work is required to test other hypotheses, such as the impact of an external Red Hat Data Grid server, possible optimizations here and there, the possibility of achieving linear scalability with a high number of nodes, or even the impact of deployments within Red Hat OpenShift.
Last updated: August 15, 2022