Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mget] Poll for tasks less frequently when the task load doesn't need it #196584

Open
3 tasks
mikecote opened this issue Oct 16, 2024 · 3 comments · May be fixed by #200260
Open
3 tasks

[mget] Poll for tasks less frequently when the task load doesn't need it #196584

mikecote opened this issue Oct 16, 2024 · 3 comments · May be fixed by #200260
Assignees
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@mikecote
Copy link
Contributor

When enabling mget as the task claim strategy, the task poll interval currently changes from every 3s to every 500ms, creating 6x the requests to Elasticsearch to claim tasks. Lowering the poll interval was added to increase the per-node task throughput. However, when observing the request load to Elasticsearch for small serverless projects that don't run many tasks, the Task Manager performs a lot of requests for little of a return.

It would be nice to poll less frequently whenever the task load doesn't need to, say back to 3s in such situations. We should take the following into consideration when coming up with an approach:

  • Backpressure mechanism: we'll need to continue applying backpressure whenever 429 errors are observed
  • Task distribution: we'll need to ensure the Kibana nodes are still running tasks evenly so one doesn't poll frequently and run all the tasks while the other doesn't poll frequently and doesn't find many tasks to run

Definition of Done

  • Poll interval is 3s whenever the task load doesn't need a 500ms poll interval
  • Backpressure mechanism still works when encountering 429 errors
  • Tests
@mikecote mikecote added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Oct 16, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@pmuellr
Copy link
Member

pmuellr commented Oct 18, 2024

Seems like it would be nice to have all the claiming-rate bits in one place - it gets fed various inputs (the 429's etc, and now number of tasks claimed over last cycle(s)) - and generates a new rate. So we don't have the logic spread out all over the place ...

@pmuellr
Copy link
Member

pmuellr commented Oct 18, 2024

Another thing we can do is check for upcoming tasks to run - nothing? Then sleep a bit longer ...

Maybe the claimer could search a bit in the future as well, and determine if there is anything coming up soon ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
4 participants