Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add presets for performance tuning to ES output configuration #166870

Closed
8 of 10 tasks
jlind23 opened this issue Sep 20, 2023 · 32 comments · Fixed by #172359
Closed
8 of 10 tasks

[Fleet] Add presets for performance tuning to ES output configuration #166870

jlind23 opened this issue Sep 20, 2023 · 32 comments · Fixed by #172359
Assignees
Labels
QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@jlind23
Copy link
Contributor

jlind23 commented Sep 20, 2023

We have recently updated the beats default settings elastic/beats#36990 but we'd like to offer users even more options that optimize for different scenarios.

To power this we want to introduce performance presets within the Elasticsearch output within Fleet. These presets are selectable by the user and optimize Agent outputs for particular scenarios (latency, throughput, scale). If a user wants to tune these values themselves they will utilize the yaml box to modify these settings.

By creating presets we can abstract away the underlying complexity of configuration from our users and target specific use cases that they need to optimize the performance towards. This abstraction also has the added benefit of allowing us to tweak individual settings and even switch out the underlying implementation without introducing a breaking change.

Design

image

Presets

Configuration Current Default Balanced Optimized for Throughput Optimized for Scale Optimized for Latency (?)
bulk_max_size 50 1600 1600 1600 50
workers 1 1 4 1 1
queue.mem.events 4096 3200 12800 3200 4100
flush.min_events 2048 1600 1600 1600 2050
flush.timeout 1 10 5 20 1
compression 0 1 1 1 1
idle_timeout 60 3 15 1 60
Performance          
Stateful Throughput 1x 3x 5x 3x 1x
Serverless Throughput 1x 5-10x 10-20x 5-10x 1x
Serverless Throughput (Relative to Stateful) 0.1x 0.2-0.3x 0.3-0.5x 0.2-0.3x 0.1x
Connections 1x 0.3x 4x 0.04x 1x
Network Traffic 1x 0.1x 0.1x 0.05x 0.1x
High-throughput Queue Latency * 1x 1x 1x 1x 1x
Low-throughput Queue Latency ** 1x 10x 5x 20x 1x

Note about the custom preset:

  • In addition to the above settings there will also be a custom preset (refer to the figma)
  • when the user picks the custom preset, whatever setting is configured in the advanced yml box will be applied to the agent. So essentially the custom preset is a no-op.
  • in order to avoid a breaking change during an upgrade; if the advanced yml box has a configuration already applied (i.e a user who has configured some output parameters) then the preset for this output should be set to custom. What ever value is in the configured yml box will be applied to the agent as is.
  • This will set us up for the future when the yml box is removed and the custom preset will have a UI for the user to enter relevant output parameters.

Outcome

  • Users can change their presets from Fleet
  • Elastic Agent receives the preset from Fleet and translates it to a local setting based on what is supported in that version. If Elastic Agent receives "Custom" as a preset, then it takes the settings from the policy and that's all it does.
  • Presets values should be documented
  • If users input additional values in the yaml box then the presets should be changed to "Custom"
  • There is a telemetry task that collects the presets values used
  • Check that fleet server can pass down the values without any code change
  • If a new policy is being created, the balanced preset should be selected by default
  • If a preexisting policy is being edited, and one of the YML keys listed in the table above appears in the textbox input, then the custom preset is selected by default
    • If one of the YML keys listed above is not present in the custom YML textbox, the policy should have preset: balanced

Docs issue

@jlind23 jlind23 added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 20, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@jlind23
Copy link
Contributor Author

jlind23 commented Sep 26, 2023

@nimarezainia @strawgate is this something you discuss with Kuldeep already?

@strawgate
Copy link
Contributor

We have discussed with Kuldeep, and the planning doc includes (I think) the current UI intentions but we can wait for him to chime in

@nimarezainia
Copy link
Contributor

@zombieFox tagged you in the definition document. Please have a look and if accepted we can move this to development. thanks

@nimarezainia
Copy link
Contributor

@zombieFox wondering if this UX was finalized and ready for development?

@jen-huang @kpollich this issue is in the current sprint. I assume that the design work is expected in this sprint and not the implementation. I'm trying to figure out whether we are targeting this for 8.12. thanks

@zombieFox
Copy link
Contributor

zombieFox commented Oct 18, 2023

I've got some thoughts in development in this Figma (WIP).

Looking to address removing the YAML box in favour of explicit controls on the UI. As well as tuning controls. This exploration/form is looking very long so far. It might not be suitable in a flyout anymore.

@nimarezainia
Copy link
Contributor

@nimarezainia
Copy link
Contributor

I've got some thoughts in development in this Figma (WIP).

Looking to address removing the YAML box in favour of explicit controls on the UI. As well as tuning controls. This exploration/form is looking very long so far. It might not be suitable in a flyout anymore.

Made some comments on the figma itself. I am very much in favour of adding more dials to replace anything that can be placed in the YAML box today.

@nimarezainia
Copy link
Contributor

@zombieFox what's the latest on this? are we ready for development?

@zombieFox
Copy link
Contributor

zombieFox commented Nov 8, 2023

The designs are in a good enough state to move to development. A sync with Kyle suggest that the Advanced Settings work he is develop should support the changes defined in the design file. I will have to resolve edge cases as they become apparent during development.

@jlind23
Copy link
Contributor Author

jlind23 commented Nov 9, 2023

@jen-huang @kpollich I changed the status of this issue to "Ready" and moved it to next sprint by default. I let you decide whether this should be done now.

@jlind23
Copy link
Contributor Author

jlind23 commented Nov 15, 2023

As elastic/beats#36990 landed in 8.12, we need to make this happened in 8.12 too otherwise users will be required to use the custom yaml box.

@nimarezainia
Copy link
Contributor

What we are asking for in this issue really is just the UI work to expose the presets and NOT necessarily exposing all the configuration options that are available in the advanced yaml box (which is what the figma is showing).

In particular reference to the figma the ask is to implement the following drop down with the presets for tuning:

image

we can develop the remainder of the changes in figma which convert the advanced yaml box to UI configuration elements at a later stage and the YAML box completely removed.

@jen-huang jen-huang changed the title [Fleet] UI rework for the Elasticsearch output settings [Fleet] Add presets for performance tuning to ES output configuration Nov 17, 2023
@nimarezainia
Copy link
Contributor

Just wanted to clarify some aspects of the presets:

  • as mentioned above a custom preset allows the user to specifically set the tuning parameters. When the custom preset is chosen, what ever the user has configured for tuning in the yaml box should be copied to the policy.
  • therefore, if the user already has a tuning parameter configured in the yaml box, the preset should be configured/reverted to being custom
  • By extension, if the user sets the preset to anything other than custom then any tuning parameter in the yaml box won't be accepted. question: would it be possible to alert the user if this happens?

@kpollich
Copy link
Member

By extension, if the user sets the preset to anything other than custom then any tuning parameter in the yaml box won't be accepted. question: would it be possible to alert the user if this happens?

If we know the set of variables/YAML keys that would prompt an alert then yes. Based on the description it seems like we'd want to alert if one or more of the following settings appears in the YAML box?

  • bulk_max_size
  • workers
  • queue.mem.events
  • flush.min_events
  • flush.timeout
  • compression
  • idle_timeout

Fleet could do a rudimentary string match on the YML box contents to detect any of these keys and alert the user that their custom values will be ignored. I think we could also actually parse the YML and check for the keys more explicitly so things like this would still trigger the alert:

queue:
  mem:
    events: 3650

But that would be slightly more involved and might be better suited for a follow-up improvement in a future release. The string match would be quickest here.

@juliaElastic
Copy link
Contributor

@kpollich Should we add the QA:Needs Validation label to this issue?

@kpollich kpollich added the QA:Needs Validation Issue needs to be validated by QA label Dec 6, 2023
kpollich added a commit that referenced this issue Dec 6, 2023
## Summary

Closes #166870
Closes #172525

- Adds a new `preset` field to output saved objects
- Updates REST spec payloads to allow `preset` field in `POST/PUT`
requests to the `/api/fleet/outputs` endpoint
- Adds logic to set default `preset` to `balanced` or `custom` based on
whether a reserved key exists in `output.config_yaml`
- Adds UI to the output settings flyout for providing a preset
- Adds backfill logic to Fleet `setup` that updates all existing outputs
+ redeploys their associated policies to ensure the proper `preset` is
provided on all policies

## To do

- [x] Fix failing tests
- [x] Add a lot of tests + testing instructions
- [x] Allow preconfigured outputs to specify a preset
- [x] Update OpenAPI spec for outputs API
- [x] Disable `EuiSelect` when output is managed
- [x] Add in-product link to performance preset docs once they exist
(might have to be a follow-up? (Follow up:
#172523)
- [x] Parse YML box contents instead of using basic string lookup for
forcing `custom` preset (Follow up:
#172525)

## How to test

1. Create a new Elasticsearch output
2. Observe the `Performance preset` dropdown defaults to `balanced`
3. Add a performance setting to the custom YAML box e.g. `bulk_max_size:
1000`
4. Note the callout with the list of reserved keys
5. Note that the dropdown switches to `Custom` and is now disabled
6. Remove the offending key
7. Note the dropdown returns to its normal state
8. Save the output
9. Edit the output and observe the same behaviors

For the backfill
1. Create a local environment with multiple elasticsearch outputs on
`main`
2. Stop Kibana
3. Checkout this PR branch
4. Restart Kibana
5. Observe the ES outputs have been updated to include the appropriate
`preset` value

## Screenshots + Screen recordings


https://github.com/elastic/kibana/assets/6766512/0c25a15e-938d-4747-8846-d51a9ad01968

---------

Co-authored-by: kibanamachine <[email protected]>
@jlind23
Copy link
Contributor Author

jlind23 commented Dec 7, 2023

@kpollich Part of the tasks here was to add a telemetry task that collects the presets values used, after looking at the PR you linked i'm unsure this has been done. Could you please update me?

@kpollich
Copy link
Member

kpollich commented Dec 7, 2023

No I wasn't able to get to that and the PR became quite large. I can follow up with another PR.

@jlind23
Copy link
Contributor Author

jlind23 commented Dec 7, 2023

@kpollich That would be great, thanks.

@kpollich
Copy link
Member

kpollich commented Dec 7, 2023

Telemetry issue: #172818

PR: #172838

@strawgate
Copy link
Contributor

@amolnater-qasource should we also verify that the expected behavior for reserved keys in yaml already existing before upgrade to 8.12 and verifying the end result after upgrade to 8.12?

i.e. I add a reserved key to the yaml box in 8.11 and then upgrade to 8.12 and verify that the preset is set to custom and my customization is still present

@amolnater-qasource
Copy link

Hi @strawgate

Thank you for suggesting the scenario.

We have tested this on 8.11.3>8.12.0 BC2 upgrade and found it working fine.

Observations:

  • Custom field is set on successful upgrade when reserved keys are added in YAML before kibana upgrade.

Screen Recording:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-12-15.11-10-51.mp4

Further, could you please confirm if we should add a regression testcase to this for validating on 8.12.0+(8.13.0 and so on..) kibana versions?

As per our understanding, the feature will be already available from 8.12.0 onwards, so we can keep this scenario as a part of exploratory testing.

Please let us know if we are missing anything here.
Thanks!!

@jlind23
Copy link
Contributor Author

jlind23 commented Dec 15, 2023

Further, could you please confirm if we should add a regression testcase to this for validating on 8.12.0+(8.13.0 and so on..) kibana versions?

Yes this would be great and indeed only available from 8.12.0 onwards.

@nicpenning
Copy link

nicpenning commented Dec 15, 2023

👋

So I stumbled upon this because we firmly believe not all the filebeat performance options are configurable. More specifically, we don't think queue.mem.events is taking affect. Does this PR actually expose all settings for filebeat/beats? Or just the start of it?

@nimarezainia
Copy link
Contributor

So I stumbled upon this because we firmly believe not all the filebeat performance options are configurable. More specifically, we don't think queue.mem.events is taking affect. Does this PR actually expose all settings for filebeat/beats? Or just the start of it?

@nicpenning your observation is correct. None of the queue.mem related configs were available under agent, so if configured in the yaml, they weren't taking effect. All the configurable parameters are now available under the output config in agent. We are seeing that the best throughput is achieved when (queue.mem.events = workers * 2 * bulk_max_size)

@nicpenning
Copy link

So I stumbled upon this because we firmly believe not all the filebeat performance options are configurable. More specifically, we don't think queue.mem.events is taking affect. Does this PR actually expose all settings for filebeat/beats? Or just the start of it?

@nicpenning your observation is correct. None of the queue.mem related configs were available under agent, so if configured in the yaml, they weren't taking effect. All the configurable parameters are now available under the output config in agent. We are seeing that the best throughput is achieved when (queue.mem.events = workers * 2 * bulk_max_size)

Hey Nima! Are you saying this is available in 8.12 or now (8.11.3)? If so, now, where is the location you ate speaking of.

@nimarezainia
Copy link
Contributor

So I stumbled upon this because we firmly believe not all the filebeat performance options are configurable. More specifically, we don't think queue.mem.events is taking affect. Does this PR actually expose all settings for filebeat/beats? Or just the start of it?

@nicpenning your observation is correct. None of the queue.mem related configs were available under agent, so if configured in the yaml, they weren't taking effect. All the configurable parameters are now available under the output config in agent. We are seeing that the best throughput is achieved when (queue.mem.events = workers * 2 * bulk_max_size)

Hey Nima! Are you saying this is available in 8.12 or now (8.11.3)? If so, now, where is the location you ate speaking of.

This is an 8.12 feature, currently targeted for early in the new year. It will only apply to agents at 8.12. the docs are here if you wanted to have a look. The presets themselves will appear in the Output flyout. Hope that answers your question.

@nicpenning
Copy link

It does and that is what I figured. We can plan for 8.12.0 to try this out. If successful, we may be able to be fully fleet managed agents next year! Thank you!

@strawgate
Copy link
Contributor

It does and that is what I figured. We can plan for 8.12.0 to try this out. If successful, we may be able to be fully fleet managed agents next year! Thank you!

This PR is for some "presets" for different use-cases available like "Optimized for Throughput", "Optimized for Latency", that set queue settings, bulk size, timeout settings, etc to optimize the particular scenario. We would love to hear from you if the new "Optimized for Throughput" preset that will be available in 8.12 fills your need or if you still need to manually tune queue settings with 8.12.

@nicpenning
Copy link

It does and that is what I figured. We can plan for 8.12.0 to try this out. If successful, we may be able to be fully fleet managed agents next year! Thank you!

This PR is for some "presets" for different use-cases available like "Optimized for Throughput", "Optimized for Latency", that set queue settings, bulk size, timeout settings, etc to optimize the particular scenario. We would love to hear from you if the new "Optimized for Throughput" preset that will be available in 8.12 fills your need or if you still need to manually tune queue settings with 8.12.

I am tracking.

FYI, the presets provided don't fit much of our use cases because we use multiple integrations per policy and it's never really been identified how each integration handles the Fleet output across all integrations in the same policy. If the Integration is heavy enough it is in its own policy and requires much higher numbers that the top provided are in the presets. I definitely admire the effort here but it seems like the higher performance settings seem very low. Once we have 8.12 we can try the higher throughout options to confirm 👍🏻.

I really like the idea of selecting a scalable performance 1x through 20x and the settings scale to the right workers/bulk max size/queue, etc.. just some thoughts.

@amolnater-qasource
Copy link

Hi Team,

We have executed 07 testcases under the Feature test run for the 8.12.0 release at the link:

Status:

PASS: 07

Build details:
VERSION: 8.12.0 BC4
BUILD: 70016
COMMIT: c2fda47
Artifact Link: https://staging.elastic.co/8.12.0-e9640208/summary-8.12.0.html

As the testing is completed on this feature, we are marking this as QA:Validated.

Please let us know if anything else is required from our end.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants