Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale service destination based on available memory #11739

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelogs/head.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ https://github.com/elastic/apm-server/compare/8.10\...main[View commits]
- Add back gzip support for grpc otlp endpoint {pull}11434[11434]
- Correctly mark jvm.memory.non_heap.pool.* and jvm.fd.* metrics as internal {pull}11303[11303]
- Fix tail-based sampling discarding low throughput and low sample rate traces {pull}11642[11642]
- Add memory based autoscaling for service destination aggregation groups {pull}11739[11739]

[float]
==== Intake API Changes
Expand Down
43 changes: 21 additions & 22 deletions internal/beater/beater.go
Original file line number Diff line number Diff line change
Expand Up @@ -230,23 +230,34 @@ func (s *Runner) Run(ctx context.Context) error {
}

if s.config.Aggregation.MaxServices <= 0 {
s.config.Aggregation.MaxServices = maxGroupsForAggregation(memLimitGB)
// scale based on available memory considering 1K groups for 1GB
s.config.Aggregation.MaxServices = linearScaledValue(1_000, memLimitGB)
s.logger.Infof("Aggregation.MaxServices set to %d based on %0.1fgb of memory",
s.config.Aggregation.MaxServices, memLimitGB,
)
}

if s.config.Aggregation.ServiceTransactions.MaxGroups <= 0 {
// scale based on available memory considering 1K groups for 1GB
s.config.Aggregation.ServiceTransactions.MaxGroups = linearScaledValue(1_000, memLimitGB)
s.logger.Infof("Aggregation.ServiceTransactions.MaxGroups for service aggregation set to %d based on %0.1fgb of memory",
s.config.Aggregation.ServiceTransactions.MaxGroups, memLimitGB,
)
}

if s.config.Aggregation.Transactions.MaxGroups <= 0 {
s.config.Aggregation.Transactions.MaxGroups = maxTxGroupsForAggregation(memLimitGB)
// scale based on available memory considering 5K groups for 1GB
s.config.Aggregation.Transactions.MaxGroups = linearScaledValue(5_000, memLimitGB)
s.logger.Infof("Aggregation.Transactions.MaxGroups set to %d based on %0.1fgb of memory",
s.config.Aggregation.Transactions.MaxGroups, memLimitGB,
)
}

if s.config.Aggregation.ServiceTransactions.MaxGroups <= 0 {
s.config.Aggregation.ServiceTransactions.MaxGroups = maxGroupsForAggregation(memLimitGB)
s.logger.Infof("Aggregation.ServiceTransactions.MaxGroups for service aggregation set to %d based on %0.1fgb of memory",
s.config.Aggregation.ServiceTransactions.MaxGroups, memLimitGB,
if s.config.Aggregation.ServiceDestinations.MaxGroups <= 0 {
// scale based on available memory considering 5K groups for 1GB
s.config.Aggregation.ServiceDestinations.MaxGroups = linearScaledValue(5_000, memLimitGB)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[For reviewers] Not sure what is a good value here, previous default was 10K for all, now we will have much greater values. However, service destination is not very costly so I kept it 5k/GB. Let me know if others have any concerns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: if it was a constant 10k for all, do we consider this a breaking change for 1GB users? do we accept the risk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without indicators that current limits were too high, I'd keep the 10k for 1GB, and then afterwards start 5k steps per GB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service destination doesn't use histograms and only uses 2 float64, which makes it very light on memory. +1 on keeping 10k for 1GB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting with 10K for 1GB sounds good to me. One potential drawback can be that if we were to introduce summary or histograms in the future in the service destination metrics then we might have to introduce a breaking change to reduce the limits but probably better to do a breaking change then than now.

afterwards start 5k steps per GB.

Should we make it 10K per GB instead or will that be too high for bigger APM servers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it 10K per GB instead or will that be too high for bigger APM servers?

Wouldn't be a problem now as there are no histograms. But as you've said, it we were to introduce histograms, it will be 2x the other limits and require us to make a breaking change to reduce the limits, but we will then have a pressing reason to do so. So I'm fine with 10k per GB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start with 10K for 1GB and then 5K for additional GB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code to use 10k for 1GB and then 5k per GB for svc destination limit

s.logger.Infof("Aggregation.ServiceDestinations.MaxGroups set to %d based on %0.1fgb of memory",
s.config.Aggregation.Transactions.MaxGroups, memLimitGB,
)
}

Expand Down Expand Up @@ -568,26 +579,14 @@ func maxConcurrentDecoders(memLimitGB float64) uint {
return decoders
}

// maxGroupsForAggregation calculates the maximum service groups that a
// particular memory limit can have. This will be scaled linearly for bigger
// instances.
func maxGroupsForAggregation(memLimitGB float64) int {
const maxMemGB = 64
if memLimitGB > maxMemGB {
memLimitGB = maxMemGB
}
return int(memLimitGB * 1_000)
}

// maxTxGroupsForAggregation calculates the maximum transaction groups that a
// particular memory limit can have. This will be scaled linearly for bigger
// instances.
func maxTxGroupsForAggregation(memLimitGB float64) int {
// linearScaledValue calculates linearly scaled value based on memory limit where
// c denotes the value for 1GB.
func linearScaledValue(c, memLimitGB float64) int {
const maxMemGB = 64
if memLimitGB > maxMemGB {
memLimitGB = maxMemGB
}
return int(memLimitGB * 5_000)
return int(memLimitGB * c)
}

// waitReady waits until the server is ready to index events.
Expand Down