Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task Manager health API - workload.value.average_interval_ms #96893

Open
kobelb opened this issue Apr 12, 2021 · 4 comments
Open

Task Manager health API - workload.value.average_interval_ms #96893

kobelb opened this issue Apr 12, 2021 · 4 comments
Labels
enhancement New value added to drive a business result estimate:small Small Estimated Level of Effort Feature:Task Manager insight Issues related to user insight into platform operations and resilience Project:AutoscalingKibana Autoscaling Kibana in Cloud resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility response-ops-ec-backlog ResponseOps E&C backlog Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@kobelb
Copy link
Contributor

kobelb commented Apr 12, 2021

Problem

Currently, the Task Manager health API returns statistics about Task Manager's configuration, workload, and runtime performance. The workload.value.schedule currently returns the 10 most frequent intervals for the scheduled tasks, but it does not return the intervals for all scheduled tasks, as this would be infeasible to return a "bucket" for every single interval:

Screen Shot 2021-04-12 at 1 56 20 PM

As part of the autoscaling Kibana project, we would like to scale Kibana based on the task-capacity vs the scheduled task-load. One of the missing data-points for performing this calculation is the average interval for all scheduled tasks and this can't be inferred from the workload.value.schedule field.

Solution

The task-manager health API should be updated to return the workload.value.average_interval_ms to support this autoscaling calculation.

Currently, each task document has a task.schedule.interval field; however, this is a keyword field and stores the intervals using Elasticsearch's date interval syntax: 10m for 10 minutes, 100ms for 10 milliseconds. As a result, it's not possible to use the Elasticsearch avg aggregation on the task.schedule.interval field. Instead, a task.schedule.interval_ms field should be added so that the Elasticsearch avg aggregation can efficiently run.

@kobelb kobelb added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Apr 12, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@kobelb kobelb added enhancement New value added to drive a business result Project:AutoscalingKibana Autoscaling Kibana in Cloud labels Apr 12, 2021
@mikecote
Copy link
Contributor

Adding to To-Do, we'll research what it will take to accomplish this to know what lead time we'll need. If the research shows it's easy to do the work, might as well.

@kobelb
Copy link
Contributor Author

kobelb commented Apr 13, 2021

Thanks @mikecote <3 There will be a couple of other smaller issues that I'll be creating a bit later this week that I'd like to get some rough "guesstimates" on for the Autoscaling work. So, I'd recommend holding off until those are created as well because they're all related.

@mikecote
Copy link
Contributor

Thanks, @kobelb! I've moved this into the backlog. Keep me posted when you're ready for us to provide guesstimates and we'll do them all at the same time 🙏

@gmmorris gmmorris added the resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility label Jul 15, 2021
@gmmorris gmmorris added loe:medium Medium Level of Effort insight Issues related to user insight into platform operations and resilience labels Aug 11, 2021
@gmmorris gmmorris added the estimate:small Small Estimated Level of Effort label Aug 18, 2021
@gmmorris gmmorris removed the loe:medium Medium Level of Effort label Sep 2, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
@mikecote mikecote added the response-ops-ec-backlog ResponseOps E&C backlog label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result estimate:small Small Estimated Level of Effort Feature:Task Manager insight Issues related to user insight into platform operations and resilience Project:AutoscalingKibana Autoscaling Kibana in Cloud resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility response-ops-ec-backlog ResponseOps E&C backlog Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
Development

No branches or pull requests

4 participants