You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 6, 2024. It is now read-only.
Currently, CPU measurements are done by looking at the peak CPU usage over a period of time. I would argue that this is the wrong metric to look at.
Consider a case where we need to process 20 'things'. There are many different timings which this could be done. For example:
In this case, both processes are doing the same amount of work, and using the same amount of CPU total. However, when looking at peak CPU, the 'spike' line will be reported as using 10x CPU as the other.
The 'spike' process could have easily just put cpu: limit: 10m on its pod definition (or done application level throttling) and gotten the same behavior. However, typically applications will not do this (or at least not this aggressively) as the spiking behavior is actually desired - if the node has sufficient CPU available, why intentionally slow things down?
In the case data plane, throttling will very likely have a latency impact which would have balance this out a bit, but not entirely. For example, if there are large configuration changes at the start or end of the test, the data plane CPU may have a brief spike to process this configuration which would lead to a high max CPU reported, despite the lower CPU spent during the other 99% of the test.
On the control plane side, this is even more skewed, as the speed of the control plane is not measured at all in this test. The test could be 'gamed' by just setting an absurdly low CPU limit. Because the test is not benchmarking speed of configuration propogation or other control plane speeds, this would show up strictly as an improvement.
Note I think this mostly applies to CPU. For memory its probably a pretty reasonable metric, and memory is much less likely to be spikey.
Impact
Benchmark does not align with real world expectations
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Current situation
Currently, CPU measurements are done by looking at the peak CPU usage over a period of time. I would argue that this is the wrong metric to look at.
Consider a case where we need to process 20 'things'. There are many different timings which this could be done. For example:
In this case, both processes are doing the same amount of work, and using the same amount of CPU total. However, when looking at peak CPU, the 'spike' line will be reported as using 10x CPU as the other.
The 'spike' process could have easily just put
cpu: limit: 10m
on its pod definition (or done application level throttling) and gotten the same behavior. However, typically applications will not do this (or at least not this aggressively) as the spiking behavior is actually desired - if the node has sufficient CPU available, why intentionally slow things down?In the case data plane, throttling will very likely have a latency impact which would have balance this out a bit, but not entirely. For example, if there are large configuration changes at the start or end of the test, the data plane CPU may have a brief spike to process this configuration which would lead to a high max CPU reported, despite the lower CPU spent during the other 99% of the test.
On the control plane side, this is even more skewed, as the speed of the control plane is not measured at all in this test. The test could be 'gamed' by just setting an absurdly low CPU limit. Because the test is not benchmarking speed of configuration propogation or other control plane speeds, this would show up strictly as an improvement.
Note I think this mostly applies to CPU. For memory its probably a pretty reasonable metric, and memory is much less likely to be spikey.
Impact
Benchmark does not align with real world expectations
The text was updated successfully, but these errors were encountered: