-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate timeout issue and use of time range in stack monitoring queries #189728
Comments
Regarding the
|
Circling back to this after having discussed with support, we could see that adding a time range to those queries showed that they run much faster and they do not hit the frozen tier. Knowing that they execute fast when they do not hit the frozen tier, we might not even have to add a time frame to these two queries, since it should be impossible for the |
Adding a constraint on the A quick summary of how much the query
If we would like to pursue this, we see two options:
|
I eventually figured out that we cannot leverage the So if the user decides to go back in time (e.g. 3 weeks) to monitor the behavior of an index or a node at that time, we need to retrieve the cluster state/stats/shards from that time, hence using What I'm trying next is to enhance the |
My last experiment described above yielded mixed results, depending on whether the cluster is monitored via internal monitoring or via Metricbeat (or Elastic agent). Internal monitoringWhen the cluster is monitored via internal monitoring (i.e. data stored in the We could argue that we could only use the start time and leave the time range open ended, but that would only work when selecting "Last XX minutes/hours/days". Selecting any closed time range in the past (e.g. specific day or week) wouldn't work the same way. Metricbeat or Elastic agent monitoringWhen the cluster is monitored via either Metricbeat or the Elastic Agent (i.e. data stored in the So what's next...To summarize, the way internal monitoring currently works disqualifies the idea of using a time range. Given that internal monitoring days are counted (still being debated), I don't think we should/can ask them to introduce "put if absent" semantics. I'm on the look out for further ideas... Stay tuned... |
Related to https://github.com/elastic/sdh-elasticsearch/issues/8151
There is a reported issue of timeouts while using Stack Monitoring After some investigation we saw that in Stack Monitoring we have queries without a date range filter:
getClustersState
getShardStats
Both run this query
getUnassignedShardData
This idea here is to investigate how to improve the queries and possibly include a time range while maintaining the same functionality.
The text was updated successfully, but these errors were encountered: