Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip scheduling actions for the alerts without scheduledActions #195948

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ersin-erdal
Copy link
Contributor

Resolves: #190258

As a result of #190258, we have found out that the odd behaviour happens when an existing alert is pushed above the max alerts limit by a new alert.

Scenario:

  1. The rule type detects 4 alerts (alert-1, alert-2, alert-3, alert-4),
    But reports only the first 3 as the max alerts limit is 3.

  2. alert-2 becomes recovered, therefore the rule type reports 3 active (alert-1, alert-3, alert-4), 1 recovered (alert-2) alert.

  3. Alerts alert-1, alert-3, alert-4 are saved in the task state.

  4. alert-2 becomes active again (the others are still active)

  5. Rule type reports 3 active alerts (alert-1, alert-2, alert-3)

  6. As a result, the action scheduler tries to schedule actions for alert-1, alert-3, alert-4 as they are the existing alerts.
    But, since the rule type didn't report the alert-4 it has no scheduled actions, therefore the action scheduler assumes it is recovered and tries to schedule a recovery action.

This PR changes the actionScheduler to handle active and recovered alerts separately.
With this change, no action would be scheduled for an alert from previous run (exists in the task state) and isn't reported by the ruleTypeExecutor due to max-alerts-limit but it would be kept in the task state.

@ersin-erdal ersin-erdal added release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) labels Oct 11, 2024
@ersin-erdal ersin-erdal self-assigned this Oct 11, 2024
@ersin-erdal ersin-erdal marked this pull request as ready for review October 11, 2024 15:28
@ersin-erdal ersin-erdal requested a review from a team as a code owner October 11, 2024 15:28
@elasticmachine
Copy link
Contributor

⏳ Build in-progress, with failures

Failed CI Steps

History

cc @ersin-erdal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ResponseOps][alerting] Investigate odd behaviors when max alerts limit reached
2 participants