-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale service destination based on available memory #11739
Scale service destination based on available memory #11739
Conversation
internal/beater/beater.go
Outdated
s.config.Aggregation.ServiceTransactions.MaxGroups, memLimitGB, | ||
if s.config.Aggregation.ServiceDestinations.MaxGroups <= 0 { | ||
// scale based on available memory considering 5K groups for 1GB | ||
s.config.Aggregation.ServiceDestinations.MaxGroups = linearScaledValue(5_000, memLimitGB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For reviewers] Not sure what is a good value here, previous default was 10K for all, now we will have much greater values. However, service destination is not very costly so I kept it 5k/GB. Let me know if others have any concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: if it was a constant 10k for all, do we consider this a breaking change for 1GB users? do we accept the risk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without indicators that current limits were too high, I'd keep the 10k for 1GB, and then afterwards start 5k steps per GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service destination doesn't use histograms and only uses 2 float64, which makes it very light on memory. +1 on keeping 10k for 1GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting with 10K for 1GB sounds good to me. One potential drawback can be that if we were to introduce summary or histograms in the future in the service destination metrics then we might have to introduce a breaking change to reduce the limits but probably better to do a breaking change then than now.
afterwards start 5k steps per GB.
Should we make it 10K per GB instead or will that be too high for bigger APM servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make it 10K per GB instead or will that be too high for bigger APM servers?
Wouldn't be a problem now as there are no histograms. But as you've said, it we were to introduce histograms, it will be 2x the other limits and require us to make a breaking change to reduce the limits, but we will then have a pressing reason to do so. So I'm fine with 10k per GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd start with 10K for 1GB and then 5K for additional GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code to use 10k for 1GB and then 5k per GB for svc destination limit
This pull request does not have a backport label. Could you fix it @lahsivjar? 🙏
NOTE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this change may be breaking for 1GB
- please also update
docs/data-model.asciidoc
Testing notes: ❌ test-plan-regression
|
Fix regression introduced in elastic#11739 - Fix bug where code is never executed - Fix wrong log message
* Fix service destination max group scaling based on memory Fix regression introduced in #11739 - Fix bug where code is never executed - Fix wrong log message - Fix failing test
…11906) * Fix service destination max group scaling based on memory Fix regression introduced in #11739 - Fix bug where code is never executed - Fix wrong log message - Fix failing test (cherry picked from commit 791f582) Co-authored-by: Carson Ip <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Tested via the regression fix PR: #11905 (comment). Marking as |
Closes #11721