Task Manager health API - workload.value.average_interval_ms
#96893
Labels
enhancement
New value added to drive a business result
estimate:small
Small Estimated Level of Effort
Feature:Task Manager
insight
Issues related to user insight into platform operations and resilience
Project:AutoscalingKibana
Autoscaling Kibana in Cloud
resilience
Issues related to Platform resilience in terms of scale, performance & backwards compatibility
response-ops-ec-backlog
ResponseOps E&C backlog
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
Problem
Currently, the Task Manager health API returns statistics about Task Manager's configuration, workload, and runtime performance. The
workload.value.schedule
currently returns the 10 most frequent intervals for the scheduled tasks, but it does not return the intervals for all scheduled tasks, as this would be infeasible to return a "bucket" for every single interval:As part of the autoscaling Kibana project, we would like to scale Kibana based on the task-capacity vs the scheduled task-load. One of the missing data-points for performing this calculation is the average interval for all scheduled tasks and this can't be inferred from the
workload.value.schedule
field.Solution
The task-manager health API should be updated to return the
workload.value.average_interval_ms
to support this autoscaling calculation.Currently, each task document has a
task.schedule.interval
field; however, this is akeyword
field and stores the intervals using Elasticsearch's date interval syntax:10m
for 10 minutes, 100ms for 10 milliseconds. As a result, it's not possible to use the Elasticsearch avg aggregation on thetask.schedule.interval
field. Instead, atask.schedule.interval_ms
field should be added so that the Elasticsearch avg aggregation can efficiently run.The text was updated successfully, but these errors were encountered: