-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.ml indices that are closed prevent Kibana monitoring from displaying. #91893
Comments
Pinging @elastic/ml-core (Team:ML) |
I think this is a bug in Kibana monitoring, not the ML stats endpoint. In this case the user has chosen to incapacitate ML by closing an internal index. It's reasonable that ML APIs should return errors under these circumstances. The only alternative would be to silently return incorrect/incomplete data, but that entirely goes against the Elasticsearch philosophy of making it clear when things aren't working.
For me that is the bug. Stack Monitoring must be assembling an overall response by combining the output of many APIs. But if one of those responses is an error it tells you you need to enable monitoring, which is wrong. Instead it should tell you that some portion of the stats are not available due to an error, and show you what is available. If Stack Monitoring is not handling errors appropriately then the only alternative is that we change all the APIs it calls to never return an error. That would lead to Stack Monitoring silently missing out information in the event of some error, and would also affect other uses of those same APIs. With get anomaly detector stats the current error message for a closed index makes it totally clear how to fix the problem. If we made it silently return an empty response in this case then we'd get bugs opened along the lines of, "Get anomaly detector stats is returning an empty response when I have lots of jobs and I cannot figure out why". And then someone would have to spend hours trawling through a support diag to work out that the root cause was a closed index. |
@elastic/stack-monitoring which repo should this issue be transferred to for further investigation? There must be some point on the journey the data takes from the underlying Elasticsearch APIs through metricbeat and into the stack monitoring UI where the absence of this one part of the data causes the stack monitoring UI to display nothing and say stack monitoring is not enabled when it is. |
I chatted with @klacabane about this on Slack. Metricbeat is calling the ML stats API from: https://github.com/elastic/beats/blob/a106ad28c7c8f76d7bdfbb43ef88b077d6ef2327/metricbeat/module/elasticsearch/ml_job/ml_job.go#L33 An example response to
I think the general principle for stack monitoring should be that if any one API call it makes returns an error then the monitoring page should still display what it can. |
We have an open issue to make the Stack monitoring more tolerant to missing metricsets elastic/kibana#130577 As for the failure to collect metricsets:
|
I don't think ML not being able to provide stats should stop the whole usage endpoint failing, so I'll change it so that it just omits the ML information if it cannot be obtained. Since you've already got an issue for being more tolerant to missing metricsets in general I'll transfer this issue back to ML. |
This is back from the |
It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Fixes elastic#91893
I have confirmed that #91917 fixes this. Now if you close the ML results index the stack monitoring page still displays, albeit with an incorrect value for the number of ML jobs. I think that's the best that can be expected in the circumstances. Internal features cannot be expected to operate perfectly if their internal state has been changed in unexpected ways. But at least now meddling with the ML internals doesn't completely break stack monitoring. |
It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Fixes #91893
Elasticsearch Version
8.3.3, 8.4.3
Installed Plugins
No response
Java Version
bundled
OS Version
any
Problem Description
.ml indices that are closed prevent Kibana monitoring from displaying. Using the GET _ml/anomaly_detectors/_stats endpoint:
Steps to Reproduce
(Reproduced)
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: