Scale service destination based on available memory #11739

lahsivjar · 2023-09-28T15:53:23Z

lahsivjar · 2023-09-28T15:54:44Z

internal/beater/beater.go

-			s.config.Aggregation.ServiceTransactions.MaxGroups, memLimitGB,
+	if s.config.Aggregation.ServiceDestinations.MaxGroups <= 0 {
+		// scale based on available memory considering 5K groups for 1GB
+		s.config.Aggregation.ServiceDestinations.MaxGroups = linearScaledValue(5_000, memLimitGB)


[For reviewers] Not sure what is a good value here, previous default was 10K for all, now we will have much greater values. However, service destination is not very costly so I kept it 5k/GB. Let me know if others have any concerns.

question: if it was a constant 10k for all, do we consider this a breaking change for 1GB users? do we accept the risk?

Without indicators that current limits were too high, I'd keep the 10k for 1GB, and then afterwards start 5k steps per GB.

service destination doesn't use histograms and only uses 2 float64, which makes it very light on memory. +1 on keeping 10k for 1GB.

Starting with 10K for 1GB sounds good to me. One potential drawback can be that if we were to introduce summary or histograms in the future in the service destination metrics then we might have to introduce a breaking change to reduce the limits but probably better to do a breaking change then than now.

afterwards start 5k steps per GB.

Should we make it 10K per GB instead or will that be too high for bigger APM servers?

Should we make it 10K per GB instead or will that be too high for bigger APM servers?

Wouldn't be a problem now as there are no histograms. But as you've said, it we were to introduce histograms, it will be 2x the other limits and require us to make a breaking change to reduce the limits, but we will then have a pressing reason to do so. So I'm fine with 10k per GB.

I'd start with 10K for 1GB and then 5K for additional GB.

Updated code to use 10k for 1GB and then 5k per GB for svc destination limit

mergify · 2023-09-28T15:58:41Z

This pull request does not have a backport label. Could you fix it @lahsivjar? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-7.17 is the label to automatically backport to the 7.17 branch.
backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

carsonip

this change may be breaking for 1GB
please also update docs/data-model.asciidoc

internal/beater/beater.go

carsonip · 2023-10-19T16:51:08Z

Testing notes:

❌ test-plan-regression

No "Aggregation.ServiceDestinations.MaxGroups set to %d based on %0.1fgb of memory" found in logs. Probably due to mixing a default & s.config.Aggregation.ServiceDestinations.MaxGroups <= 0 check.

Fix regression introduced in elastic#11739 - Fix bug where code is never executed - Fix wrong log message

* Fix service destination max group scaling based on memory Fix regression introduced in #11739 - Fix bug where code is never executed - Fix wrong log message - Fix failing test

* Fix service destination max group scaling based on memory Fix regression introduced in #11739 - Fix bug where code is never executed - Fix wrong log message - Fix failing test (cherry picked from commit 791f582)

…11906) * Fix service destination max group scaling based on memory Fix regression introduced in #11739 - Fix bug where code is never executed - Fix wrong log message - Fix failing test (cherry picked from commit 791f582) Co-authored-by: Carson Ip <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

lahsivjar · 2023-10-27T08:59:44Z

Tested via the regression fix PR: #11905 (comment). Marking as test-plan-ok now.

Scale service destination based on available memory

a67f9bc

lahsivjar requested a review from a team as a code owner September 28, 2023 15:53

lahsivjar commented Sep 28, 2023

View reviewed changes

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Sep 28, 2023

Add changelog

2c2db61

carsonip reviewed Sep 29, 2023

View reviewed changes

lahsivjar added 3 commits October 2, 2023 17:08

Update service destination to use a 5k y intercept

0c533c6

Update data model docs

ddd77a8

Merge branch 'main' into fix-span-destination-scaling-11721

93e9426

lahsivjar requested review from simitt and carsonip October 2, 2023 09:15

carsonip previously approved these changes Oct 2, 2023

View reviewed changes

internal/beater/beater.go Outdated Show resolved Hide resolved

internal/beater/beater.go Outdated Show resolved Hide resolved

Fix lint

9747a02

lahsivjar dismissed carsonip’s stale review via 9747a02 October 2, 2023 09:43

lahsivjar requested a review from carsonip October 2, 2023 09:43

lahsivjar enabled auto-merge (squash) October 2, 2023 09:44

carsonip approved these changes Oct 2, 2023

View reviewed changes

lahsivjar merged commit 60f6ac5 into elastic:main Oct 2, 2023
11 checks passed

lahsivjar deleted the fix-span-destination-scaling-11721 branch October 3, 2023 01:56

carsonip added test-plan v8.11.0 labels Oct 16, 2023

carsonip self-assigned this Oct 19, 2023

carsonip added the test-plan-regression label Oct 19, 2023

carsonip added a commit to carsonip/apm-server that referenced this pull request Oct 19, 2023

Fix service destination max group scaling based on memory

73f6804

Fix regression introduced in elastic#11739 - Fix bug where code is never executed - Fix wrong log message

carsonip mentioned this pull request Oct 19, 2023

Fix service destination max group scaling based on memory #11905

Merged

3 tasks

lahsivjar added test-plan-ok and removed test-plan-regression labels Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale service destination based on available memory #11739

Scale service destination based on available memory #11739

lahsivjar commented Sep 28, 2023

lahsivjar Sep 28, 2023

carsonip Sep 29, 2023

simitt Sep 29, 2023

carsonip Sep 29, 2023

lahsivjar Oct 2, 2023

carsonip Oct 2, 2023

simitt Oct 2, 2023

lahsivjar Oct 2, 2023

mergify bot commented Sep 28, 2023

carsonip left a comment

carsonip commented Oct 19, 2023 •

edited

Loading

lahsivjar commented Oct 27, 2023

Scale service destination based on available memory #11739

Scale service destination based on available memory #11739

Conversation

lahsivjar commented Sep 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Sep 28, 2023

carsonip left a comment

Choose a reason for hiding this comment

carsonip commented Oct 19, 2023 • edited Loading

lahsivjar commented Oct 27, 2023

carsonip commented Oct 19, 2023 •

edited

Loading