You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original Issue Description (slightly modified from Slack)
We've recently started to adopt the FIPS stemcell in some of our environments. When we reached the first environment which had a slightly elevated background load we immediately ran into issues with HAProxy and openssl. We can see that openssl on ubuntu uses an interesting [1,2] way to generate random numbers which boils down to performing expensive checksum calculations over and over again to get some jitter in the execution time and use that as true entropy for generating random numbers. Since our HAProxy uses the openssl shipped with the OS and does TLS handshakes with clients and gorouter it requires randomness (and therefore entropy), the CPU utilisation jumped up and HAProxy started running into timeouts because it was unable to generate enough entropy with the limited CPU it has. Our rough estimates are that we would need 10x as much resources when enabling FIPS. We've written down our observations in an upstream issue [3] and would like to know whether any of you encountered similar issues and what workarounds are available. We already know of AWS CloudHSM which could be configured to take over the RNG tasks but this is also additional integration effort and the setup as a whole would probably need certification again.
We started drilling down on HAProxy and openssl and noticed that the FIPS version of openssl calls getrandom(2) a lot. getrandom(2) on the other hand seems to be limited to 150 calls/s and as a result in our experiments we were only able to accept ~12 TLS connections per second from clients. This results in one saturated core at most. We have consulted with our local crypto experts and their suspicion is that the random number generator is re-seeded way too often causing this bottle-neck as the entropy that can be gained is limited. Since the limit is per-machine the current workaround on our side is to use 2-core VMs, but loads of them, still the per-core conn/s is down from ~125 to ~12 (and this is with considering the 2-core VMs as single-core).
We are in contact with Canonical to understand the behaviour better. I'd be interested to look at the sources but it seems like the kernel source code of the FIPS version is missing. On non-FIPS I can obtain the source by installing linux-source but for FIPS the package linux-fips-source-5.15.0 is missing although its referenced by some other packages:
$ sudo apt install linux-fips-source-5.15.0
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package linux-fips-source-5.15.0 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package 'linux-fips-source-5.15.0' has no installation candidate
The text was updated successfully, but these errors were encountered:
Original Issue Description (slightly modified from Slack)
We've recently started to adopt the FIPS stemcell in some of our environments. When we reached the first environment which had a slightly elevated background load we immediately ran into issues with HAProxy and openssl. We can see that openssl on ubuntu uses an interesting [1,2] way to generate random numbers which boils down to performing expensive checksum calculations over and over again to get some jitter in the execution time and use that as true entropy for generating random numbers. Since our HAProxy uses the openssl shipped with the OS and does TLS handshakes with clients and gorouter it requires randomness (and therefore entropy), the CPU utilisation jumped up and HAProxy started running into timeouts because it was unable to generate enough entropy with the limited CPU it has. Our rough estimates are that we would need 10x as much resources when enabling FIPS. We've written down our observations in an upstream issue [3] and would like to know whether any of you encountered similar issues and what workarounds are available. We already know of AWS CloudHSM which could be configured to take over the RNG tasks but this is also additional integration effort and the setup as a whole would probably need certification again.
[1] https://www.chronox.de/jent/
[2] https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/entropy/E48_PublicUse.pdf
[3] haproxy/haproxy#2588
Details we have discovered in the meantime
We started drilling down on HAProxy and openssl and noticed that the FIPS version of openssl calls
getrandom(2)
a lot.getrandom(2)
on the other hand seems to be limited to 150 calls/s and as a result in our experiments we were only able to accept ~12 TLS connections per second from clients. This results in one saturated core at most. We have consulted with our local crypto experts and their suspicion is that the random number generator is re-seeded way too often causing this bottle-neck as the entropy that can be gained is limited. Since the limit is per-machine the current workaround on our side is to use 2-core VMs, but loads of them, still the per-core conn/s is down from ~125 to ~12 (and this is with considering the 2-core VMs as single-core).We are in contact with Canonical to understand the behaviour better. I'd be interested to look at the sources but it seems like the kernel source code of the FIPS version is missing. On non-FIPS I can obtain the source by installing
linux-source
but for FIPS the packagelinux-fips-source-5.15.0
is missing although its referenced by some other packages:The text was updated successfully, but these errors were encountered: