Task Manager health API - `workload.value.average_interval_ms` #96893

kobelb · 2021-04-12T20:48:36Z

Problem

Currently, the Task Manager health API returns statistics about Task Manager's configuration, workload, and runtime performance. The workload.value.schedule currently returns the 10 most frequent intervals for the scheduled tasks, but it does not return the intervals for all scheduled tasks, as this would be infeasible to return a "bucket" for every single interval:

As part of the autoscaling Kibana project, we would like to scale Kibana based on the task-capacity vs the scheduled task-load. One of the missing data-points for performing this calculation is the average interval for all scheduled tasks and this can't be inferred from the workload.value.schedule field.

Solution

The task-manager health API should be updated to return the workload.value.average_interval_ms to support this autoscaling calculation.

Currently, each task document has a task.schedule.interval field; however, this is a keyword field and stores the intervals using Elasticsearch's date interval syntax: 10m for 10 minutes, 100ms for 10 milliseconds. As a result, it's not possible to use the Elasticsearch avg aggregation on the task.schedule.interval field. Instead, a task.schedule.interval_ms field should be added so that the Elasticsearch avg aggregation can efficiently run.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-04-12T20:48:46Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

mikecote · 2021-04-13T16:22:20Z

Adding to To-Do, we'll research what it will take to accomplish this to know what lead time we'll need. If the research shows it's easy to do the work, might as well.

kobelb · 2021-04-13T17:59:12Z

Thanks @mikecote <3 There will be a couple of other smaller issues that I'll be creating a bit later this week that I'd like to get some rough "guesstimates" on for the Autoscaling work. So, I'd recommend holding off until those are created as well because they're all related.

mikecote · 2021-04-13T18:36:40Z

Thanks, @kobelb! I've moved this into the backlog. Keep me posted when you're ready for us to provide guesstimates and we'll do them all at the same time 🙏

kobelb added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Apr 12, 2021

kobelb added enhancement New value added to drive a business result Project:AutoscalingKibana Autoscaling Kibana in Cloud labels Apr 12, 2021

gmmorris added the Feature:Task Manager label Jul 2, 2021

gmmorris added the resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility label Jul 15, 2021

gmmorris added loe:medium Medium Level of Effort insight Issues related to user insight into platform operations and resilience labels Aug 11, 2021

gmmorris added the estimate:small Small Estimated Level of Effort label Aug 18, 2021

gmmorris removed the loe:medium Medium Level of Effort label Sep 2, 2021

mikecote added this to AppEx: ResponseOps - Execution & Connectors Jan 6, 2022

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

mikecote moved this to Todo in AppEx: ResponseOps - Execution & Connectors Aug 11, 2022

mikecote added the response-ops-ec-backlog ResponseOps E&C backlog label Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task Manager health API - `workload.value.average_interval_ms` #96893

Task Manager health API - `workload.value.average_interval_ms` #96893

kobelb commented Apr 12, 2021 •

edited

Loading

elasticmachine commented Apr 12, 2021

mikecote commented Apr 13, 2021

kobelb commented Apr 13, 2021

mikecote commented Apr 13, 2021

Task Manager health API - workload.value.average_interval_ms #96893

Task Manager health API - workload.value.average_interval_ms #96893

Comments

kobelb commented Apr 12, 2021 • edited Loading

Problem

Solution

elasticmachine commented Apr 12, 2021

mikecote commented Apr 13, 2021

kobelb commented Apr 13, 2021

mikecote commented Apr 13, 2021

Task Manager health API - `workload.value.average_interval_ms` #96893

Task Manager health API - `workload.value.average_interval_ms` #96893

kobelb commented Apr 12, 2021 •

edited

Loading