Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APPSEC-10303] Replace AppSec rate limiter with core rate limiter #3975

Merged
merged 1 commit into from
Oct 9, 2024

Conversation

Strech
Copy link
Contributor

@Strech Strech commented Oct 7, 2024

What does this PR do?

This allows us to be more precise in throttling of outgoing AppSec traces and removes additional custom logic for rate limiting.

Motivation:

We can re-use recently moved to core BucketLimiter. It will eliminate of logic implementation and improve rate limiting. Existing limiter is too aggressive (correct number is 10 traces per test run):

[15:19:31] AppSec/system-tests main
❯❯❯ while true; do docker logs -f system-tests-weblog -f 2>/dev/null | rg -o '_sampling_priority_v1\W:2' | wc -l; sleep 1; done
3
3
3
3
3
3
3

with this PR

[15:25:11] AppSec/system-tests appsec-10303-fix-rate-limiter
❯❯❯ while true; do docker logs -f system-tests-weblog -f 2>/dev/null | rg -o '_sampling_priority_v1\W:2' | wc -l; sleep 1; done
10
10
10
10
10
10
10

Additional Notes:

I've corrected some existing tests and add a bit of a new logic under tests (since we rate limiting per thread).

IMPORTANT Unfortunately, it's impossible to guarantee that we will receive exactly N amount of traces after limiting due to execution speed of the block inside the Datadog::AppSec::RateLimiter#limit. If it take a little longer the fractional part of the token will accumulate and give us an extra spare token to spend. Because of that tests were adjusted to allow this drift of a single token.

How to test the change?

Set the AppSec rate limiter on 1/TPS and shoot a bunch of requests (hopefully soon test generator will come in play)

@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch from 0f3d04e to 108be7c Compare October 7, 2024 10:04
@pr-commenter
Copy link

pr-commenter bot commented Oct 7, 2024

Benchmarks

Benchmark execution time: 2024-10-09 08:05:37

Comparing candidate commit cd06310 in PR branch appsec-10303-replace-rate-limiter-logic with baseline commit f0dd28e in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch from 108be7c to eaf9463 Compare October 7, 2024 11:48
@codecov-commenter
Copy link

codecov-commenter commented Oct 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.86%. Comparing base (f0dd28e) to head (cd06310).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3975   +/-   ##
=======================================
  Coverage   97.86%   97.86%           
=======================================
  Files        1313     1314    +1     
  Lines       78485    78497   +12     
  Branches     3892     3889    -3     
=======================================
+ Hits        76807    76822   +15     
+ Misses       1678     1675    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Strech Strech marked this pull request as ready for review October 7, 2024 12:12
@Strech Strech requested a review from a team as a code owner October 7, 2024 12:12
@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch from eaf9463 to 646d401 Compare October 7, 2024 12:17
spec/datadog/appsec/event_spec.rb Outdated Show resolved Hide resolved
spec/datadog/appsec/rate_limiter_spec.rb Show resolved Hide resolved
@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch from 646d401 to 8283c06 Compare October 7, 2024 12:55
@Strech Strech requested a review from a team as a code owner October 7, 2024 17:51
@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch 2 times, most recently from 372f8f1 to 8b90a67 Compare October 8, 2024 08:18
@Strech
Copy link
Contributor Author

Strech commented Oct 8, 2024

@ivoanjo I've adjusted the code to consider Thread locality instead of Fiber locality of the variable. Unfortunately, in 2.5.9 those functions lack a bit of behavior which is possible to find in 3.x version, like thread_variable? reacts different on nil value (in 2.5 it's considered as set and in 3.x it's considered as unset 🤣)

The test is also adjusted to be a white-box instead of the black-box.

If you have time, could you please take another look. Thanks 🙌🏼

@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch from 8b90a67 to 50e0880 Compare October 8, 2024 08:40
Comment on lines 359 to 360
expect(described_class).to receive(:record_via_span)
.at_least(rate_limit).at_most(rate_limit * 1.1).times.and_call_original
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to find a way to rework it, it brings flakiness 😞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with time freezing and emulating 1s processing time

@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch from 50e0880 to 3a78df0 Compare October 8, 2024 10:33
@Strech Strech requested a review from ivoanjo October 8, 2024 10:33
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not yet given a pass on other files, so here's a smattering of comments on them >_>

Just a bunch of small stuff, nothing blocking 👍 LGTM otherwise

def limit
return yield if @rate_limiter.allow?

Datadog.logger.debug { "Rate limit hit: #{@rate_limiter.current_window_rate} AppSec traces/second" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This debug message seems like it can happen reasonably frequently. Do we need it around?

My (small) concern is that we don't currently have a fine-grained way of disabling some messages + sometimes to debug issues we need to ask customers to enable debug-level logging. Thus, having messages that are very noisy is somewhat annoying in that situation.

I do understand if you find this message quite important and want to keep it; I'm more explaining the trade-off we have here. (The better solution would be for our logging to not be as coarse-grained...)

lib/datadog/appsec/rate_limiter.rb Outdated Show resolved Hide resolved
spec/datadog/appsec/rate_limiter_spec.rb Outdated Show resolved Hide resolved
sig/datadog/core/rate_limiter.rbs Show resolved Hide resolved
sig/datadog/core/rate_limiter.rbs Outdated Show resolved Hide resolved
sig/datadog/core/rate_limiter.rbs Outdated Show resolved Hide resolved
sig/datadog/core/rate_limiter.rbs Outdated Show resolved Hide resolved
sig/datadog/core/rate_limiter.rbs Outdated Show resolved Hide resolved
sig/datadog/core/rate_limiter.rbs Outdated Show resolved Hide resolved
sig/datadog/core/rate_limiter.rbs Show resolved Hide resolved
Copy link
Contributor

@p-datadog p-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there were types previously in the sig files changed by this PR, I would prefer to see at least those carried over to the new tree.

@@ -52,7 +52,7 @@ def record(span, *events)
# ensure rate limiter is called only when there are events to record
return if events.empty? || span.nil?

Datadog::AppSec::RateLimiter.limit(:traces) do
Datadog::AppSec::RateLimiter.instance.limit do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line matches Singleton usage and I thought instance referred to the only global instance. I suggest another name for it (maybe local_instance if you don't have better ideas).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, still working on the name. I don't want to have bypass method limit and rather provide builder/access method to the configured instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose the thread_local name to reflect locality of the limiter to the client.

Datadog::AppSec::RateLimiter.thread_local.limit do
  record_via_span(span, *events)
end

@Strech Strech marked this pull request as draft October 8, 2024 19:45
@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch 2 times, most recently from b600021 to d2a0586 Compare October 8, 2024 21:56
@Strech Strech marked this pull request as ready for review October 8, 2024 22:00
@Strech
Copy link
Contributor Author

Strech commented Oct 9, 2024

@p-datadog @ivoanjo Thanks for the feedback 🙏🏼.

I've updated the sig files properly and also enable the Core::RateLimiter to be checked by Steep. I don't think it makes sense to move these changes in the separate PR because they definitely belongs here (as I'm updating the code)

I don't know what to do with debug message, but I get the point and definitely will try to come up with an idea, but maybe later.

If there is no objection, I would like to merge it and 🚢 it!

@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch from d2a0586 to a23ddb5 Compare October 9, 2024 07:18
This allows us to be more precise in throttling outgoing AppSec traces
and removes additional custom logic of rate limiting.

Also RBS definitions of rate limiters are updated
@Strech Strech force-pushed the appsec-10303-replace-rate-limiter-logic branch from a23ddb5 to cd06310 Compare October 9, 2024 07:19
@ivoanjo
Copy link
Member

ivoanjo commented Oct 9, 2024

No concerns from my side! 🙇

@Strech Strech added the appsec Application Security monitoring product label Oct 9, 2024
@Strech Strech changed the title Replace AppSec rate limiter with core rate limiter [APPSEC-10303] Replace AppSec rate limiter with core rate limiter Oct 9, 2024
@Strech Strech merged commit c893523 into master Oct 9, 2024
248 checks passed
@Strech Strech deleted the appsec-10303-replace-rate-limiter-logic branch October 9, 2024 09:02
@github-actions github-actions bot added this to the 2.4.0 milestone Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appsec Application Security monitoring product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants