-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APPSEC-10303] Replace AppSec rate limiter with core rate limiter #3975
Conversation
0f3d04e
to
108be7c
Compare
BenchmarksBenchmark execution time: 2024-10-09 08:05:37 Comparing candidate commit cd06310 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics. |
108be7c
to
eaf9463
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3975 +/- ##
=======================================
Coverage 97.86% 97.86%
=======================================
Files 1313 1314 +1
Lines 78485 78497 +12
Branches 3892 3889 -3
=======================================
+ Hits 76807 76822 +15
+ Misses 1678 1675 -3 ☔ View full report in Codecov by Sentry. |
eaf9463
to
646d401
Compare
646d401
to
8283c06
Compare
372f8f1
to
8b90a67
Compare
@ivoanjo I've adjusted the code to consider Thread locality instead of Fiber locality of the variable. Unfortunately, in 2.5.9 those functions lack a bit of behavior which is possible to find in 3.x version, like The test is also adjusted to be a white-box instead of the black-box. If you have time, could you please take another look. Thanks 🙌🏼 |
8b90a67
to
50e0880
Compare
spec/datadog/appsec/event_spec.rb
Outdated
expect(described_class).to receive(:record_via_span) | ||
.at_least(rate_limit).at_most(rate_limit * 1.1).times.and_call_original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to find a way to rework it, it brings flakiness 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with time freezing and emulating 1s processing time
50e0880
to
3a78df0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not yet given a pass on other files, so here's a smattering of comments on them >_>
Just a bunch of small stuff, nothing blocking 👍 LGTM otherwise
def limit | ||
return yield if @rate_limiter.allow? | ||
|
||
Datadog.logger.debug { "Rate limit hit: #{@rate_limiter.current_window_rate} AppSec traces/second" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: This debug message seems like it can happen reasonably frequently. Do we need it around?
My (small) concern is that we don't currently have a fine-grained way of disabling some messages + sometimes to debug issues we need to ask customers to enable debug-level logging. Thus, having messages that are very noisy is somewhat annoying in that situation.
I do understand if you find this message quite important and want to keep it; I'm more explaining the trade-off we have here. (The better solution would be for our logging to not be as coarse-grained...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there were types previously in the sig files changed by this PR, I would prefer to see at least those carried over to the new tree.
lib/datadog/appsec/event.rb
Outdated
@@ -52,7 +52,7 @@ def record(span, *events) | |||
# ensure rate limiter is called only when there are events to record | |||
return if events.empty? || span.nil? | |||
|
|||
Datadog::AppSec::RateLimiter.limit(:traces) do | |||
Datadog::AppSec::RateLimiter.instance.limit do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line matches Singleton usage and I thought instance
referred to the only global instance. I suggest another name for it (maybe local_instance
if you don't have better ideas).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, still working on the name. I don't want to have bypass method limit
and rather provide builder/access method to the configured instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I choose the thread_local
name to reflect locality of the limiter to the client.
Datadog::AppSec::RateLimiter.thread_local.limit do
record_via_span(span, *events)
end
b600021
to
d2a0586
Compare
@p-datadog @ivoanjo Thanks for the feedback 🙏🏼. I've updated the sig files properly and also enable the I don't know what to do with debug message, but I get the point and definitely will try to come up with an idea, but maybe later. If there is no objection, I would like to merge it and 🚢 it! |
d2a0586
to
a23ddb5
Compare
This allows us to be more precise in throttling outgoing AppSec traces and removes additional custom logic of rate limiting. Also RBS definitions of rate limiters are updated
a23ddb5
to
cd06310
Compare
No concerns from my side! 🙇 |
What does this PR do?
This allows us to be more precise in throttling of outgoing AppSec traces and removes additional custom logic for rate limiting.
Motivation:
We can re-use recently moved to core
BucketLimiter
. It will eliminate of logic implementation and improve rate limiting. Existing limiter is too aggressive (correct number is10
traces per test run):with this PR
Additional Notes:
I've corrected some existing tests and add a bit of a new logic under tests (since we rate limiting per thread).
IMPORTANT Unfortunately, it's impossible to guarantee that we will receive exactly
N
amount of traces after limiting due to execution speed of the block inside theDatadog::AppSec::RateLimiter#limit
. If it take a little longer the fractional part of the token will accumulate and give us an extra spare token to spend. Because of that tests were adjusted to allow this drift of a single token.How to test the change?
Set the AppSec rate limiter on 1/TPS and shoot a bunch of requests (hopefully soon test generator will come in play)