Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rules and Alerting][Stack Monitoring] during a deployment change from ESS; ECE; ECK alerts should be muted. #130128

Open
philippkahr opened this issue Apr 13, 2022 · 9 comments

Comments

@philippkahr
Copy link
Contributor

Describe the feature:

When we run a change e.g. adding nodes, upgrading, .... The built in stack rules go crazy and alert a ton because node changed, version mismatch and so on.

Describe a specific use case for the feature:
I would expect that ESS; ECE; ECK "talks" to the stack alerts and mutes them / reconfigures them, so that during a deployment change, no alert is created. Since there is no need for one, as this is expected.

@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 13, 2022
@jasonrhodes jasonrhodes added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Apr 13, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Apr 13, 2022
@jasonrhodes
Copy link
Member

@ravikesarwani I know you've spent time thinking about this in the past, FYI.

@ravikesarwani
Copy link
Contributor

Not something we have discussed and planned so far but feels like something important that we should think about.
I do feel that it will have to be something driven from the platform/cloud team perspective as that can only know when a plan change started/completed. The Kibana/alerting framework should provide some API way that the cloud team can use to enable and disable the rules.

@philippkahr
Copy link
Contributor Author

I think there are a few edge cases that also need to be taken care off. Like missing monitoring data for node, if I do a downsizing deployment, or an instance gets replaced and thus gets a new name, the old node will not send any more data, thus the missing monitoring data for node will be triggered.

@miltonhultgren
Copy link
Contributor

How does the future Health/Topology APIs play into this? Has there been any thought for how Cloud might "tell" those APIs "I'm changing things under you, chill out for a bit"?

@matschaffer
Copy link
Contributor

If the alert API has a mute option, I'm thinking it might make sense for the orchestrator to call it before operations start. We could potentially tag the rules with metadata about which ones are expected to get noisy during certain workflows (like upgrading).

@smith
Copy link
Contributor

smith commented Apr 15, 2022

Moved to Refining on our board. We need to determine and decide whether this is something that we could/would implement or otherwise route this issue to the correct place.

@smith
Copy link
Contributor

smith commented Apr 18, 2022

Closing this. It doesn't look like something @elastic/infra-monitoring-ui would be capable of implementing at this time.

@smith smith closed this as completed Apr 18, 2022
@philippkahr
Copy link
Contributor Author

Kibana has a mute api, maybe we should revisit this issue?

@philippkahr philippkahr reopened this Aug 8, 2022
@smith smith added Team:Monitoring Stack Monitoring team and removed Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services labels Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants