-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance "Cluster health" stack monitoring rule to allow user configuration for Yellow/Red/Both #113445
Comments
I'll also note for public record ILM Searchable Snapshots coming up on Frozen tier can blip the cluster
Which'd resolve relating to the stretch goal in description
Which kinda overlaps with #145843 |
On a related node, all the built-in rules should be using the The I don't know for sure if it tracks ILM transitions yet. |
Publicly documenting lower stack versions workaround/alternative via manual Rule setup. Example is taken on Elastic Cloud against version v8.9.2 for Logs&Metrics data:
|
As part of "Cluster health" rule allow users to configure if they want to receive alert for Yellow, Red, or Both yellow and red.
The default configuration value for the rule will stay as "Both yellow and red".
Combining with our changes in 7.15 to allow multiple rules of the same type users can now configure different actions for Yellow(say email) and Red(say pagerduty), if they want.
Currently the
Cluster health
rule fires when the cluster health status changes from green to yellow OR red.There is no way for the users to configure to get alert only when the cluster state changes to "red".
Yellow status can happen based on temporary processing in Elasticsearch.
Any action that creates a new index (rollover, shrink, mounting an index, close-and-reopen (through forcemerge w/codec change)) can cause the cluster to go briefly yellow.
Stretch goal
Besides adding the extra configuration(for Yellow, Red, or Both) we should look at the possibility of "look at last X minutes of data and alert only when we see all of them to be the same status" rather than just relying on the last document status.
The text was updated successfully, but these errors were encountered: