-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip scheduling actions for the alerts without scheduledActions #195948
Skip scheduling actions for the alerts without scheduledActions #195948
Conversation
…rsin-erdal/kibana into 190259-alert-without-scheduled-actions
💔 Build Failed
Failed CI StepsTest Failures
Metrics [docs]
History
cc @ersin-erdal |
I'm thinking this may resolve at least some of the alert doc conflicts we are seeing: #190376 - not sure about all of them though. I'll add a note in that issue that we should check after this is released, to see if the conflicts reduce or are gone for the release that shipped them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM
Tested the way I repro'd the behavior in the referenced issue. No longer seeing any of the odd behavior I saw with that.
): Record<string, LegacyAlert<State, Context, ActionGroupIds | RecoveryActionGroupId>>; | ||
type: 'new' | 'active' | 'activeCurrent' | ||
): Record<string, LegacyAlert<State, Context, ActionGroupIds>> | {}; | ||
getProcessedAlerts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Overloaded definition of getProcessed()
alerts, typed to return different things based on the input!
WIll have to remember this, wonder if it could help with some of our other complex type things ...
@@ -56,7 +57,8 @@ export const getSummarizedAlerts = async < | |||
* yet (the update call uses refresh: false). So we need to rely on the in | |||
* memory alerts to do this. | |||
*/ | |||
const newAlertsInMemory = Object.values(alertsClient.getProcessedAlerts('new') || {}) || []; | |||
const newAlertsInMemory: Array<Alert<State, Context, ActionGroupIds>> = | |||
Object.values(alertsClient.getProcessedAlerts('new') || {}) || []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The || {}) || []
looks ... busy, and maybe unneeded? I tried removing it, and it seems to compile.
Then since all the rest is typed, you can remove the type declaration, and eventually the import for Alert
:
const newAlertsInMemory = Object.values(alertsClient.getProcessedAlerts('new'));
Seemed to type check ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these were for the tests as well, I will try to remove those fallback.
type: 'new' | 'active' | 'activeCurrent' | 'recovered' | 'recoveredCurrent' | ||
): Record<string, LegacyAlert<State, Context, ActionGroupIds | RecoveryActionGroupId>>; | ||
type: 'new' | 'active' | 'activeCurrent' | ||
): Record<string, LegacyAlert<State, Context, ActionGroupIds>> | {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprised the {}
is needed here. {}
should conform to Record<string, **anything**>
I think, so seems like overkill. Tried removing them and running type_check
and it failed. Weird.
One of the type check failures was in this method:
kibana/x-pack/plugins/alerting/server/alerts_client/legacy_alerts_client.ts
Lines 234 to 242 in 424ffba
public getProcessedAlerts( | |
type: 'new' | 'active' | 'activeCurrent' | 'recovered' | 'recoveredCurrent' | |
) { | |
if (Object.hasOwn(this.processedAlerts, type)) { | |
return this.processedAlerts[type]; | |
} | |
return {}; | |
} |
Perhaps this is just overly complex TS needed to handle overridden methods or something. Doesn't seem like it should cause any problems, just ... weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because getProcessedAlerts
returns {}
if it cannot find the given type in the processedAlerts.
line 241 above.
I was also expecting it to conform with Record but.... :)
public async run( | ||
alerts: Record<string, Alert<State, Context, ActionGroupIds | RecoveryActionGroupId>> | ||
): Promise<RunResult> { | ||
public async run({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typing seems weird here, as I would expect in the tests for this, we could just not pass either of these properties (or neither!) and we'd get the defaults. But you can't. I think we'd need a ?
after the property names to make that would. Would cut down on a few lines in the test file (above) ... :-)
I guess I'm a little surprised TS or lint didn't complain about "the default value will never be used because the property is not optional", or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make them optional because we always pass the active and recovered alerts. getProcessAlerts returns {}
incase of no alerts.
The default values (activeCurrentAlerts = {}
) are for the unit tests actually. Some tests does not pass any alerts at all and Object.values(activeCurrentAlerts)
breaks the execution as activeCurrentAlerts
is undefined.
I can remove the default values and use Object.values(activeCurrentAlerts || {})
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe I'm more confused - we have tests passing undefined
for some of these properties - are they bypassing the typing somehow? Because it looks to me like they are required.
Seems like we should clean this up, but we can do that in a separate PR - can you create an issue?
const results = await scheduler.getActionsToSchedule({ alerts, throttledSummaryActions }); | ||
const results = await scheduler.getActionsToSchedule({ | ||
activeCurrentAlerts: alerts, | ||
recoveredCurrentAlerts: {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd that none of the tests in this file seem to use non-empty recovered alerts, do we need some more tests here? In a follow-on PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses the passed alerts just to calculate the number of alerts. Looks like adding it to a test is enough
const results = await scheduler.getActionsToSchedule({ alerts }); | ||
const results = await scheduler.getActionsToSchedule({ | ||
activeCurrentAlerts: alerts, | ||
recoveredCurrentAlerts: {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another file not using any recovered alerts ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
system action scheduler doesn't use the alerts, so i removed all of them.
return false; | ||
} | ||
|
||
private isValidActionGroup(actionGroup: ActionGroupIds | RecoveryActionGroupId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice cleanup to extract these to functions!
…rsin-erdal/kibana into 190259-alert-without-scheduled-actions
💔 Build Failed
Failed CI StepsTest Failures
Metrics [docs]
History
cc @ersin-erdal |
Starting backport for target branches: 8.x |
…tic#195948) Resolves: elastic#190258 As a result of elastic#190258, we have found out that the odd behaviour happens when an existing alert is pushed above the max alerts limit by a new alert. Scenario: 1. The rule type detects 4 alerts (`alert-1`, `alert-2`, `alert-3`, `alert-4`), But reports only the first 3 as the max alerts limit is 3. 2. `alert-2` becomes recovered, therefore the rule type reports 3 active (`alert-1`, `alert-3`, `alert-4`), 1 recovered (`alert-2`) alert. 3. Alerts `alert-1`, `alert-3`, `alert-4` are saved in the task state. 4. `alert-2` becomes active again (the others are still active) 5. Rule type reports 3 active alerts (`alert-1`, `alert-2`, `alert-3`) 6. As a result, the action scheduler tries to schedule actions for `alert-1`, `alert-3`, `alert-4` as they are the existing alerts. But, since the rule type didn't report the `alert-4` it has no scheduled actions, therefore the action scheduler assumes it is recovered and tries to schedule a recovery action. This PR changes the actionScheduler to handle active and recovered alerts separately. With this change, no action would be scheduled for an alert from previous run (exists in the task state) and isn't reported by the ruleTypeExecutor due to max-alerts-limit but it would be kept in the task state. (cherry picked from commit dd25bf8)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…#195948) (#196458) # Backport This will backport the following commits from `main` to `8.x`: - [Skip scheduling actions for the alerts without scheduledActions (#195948)](#195948) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ersin Erdal","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-15T23:40:19Z","message":"Skip scheduling actions for the alerts without scheduledActions (#195948)\n\nResolves: #190258\r\n\r\nAs a result of #190258, we have found out that the odd behaviour happens\r\nwhen an existing alert is pushed above the max alerts limit by a new\r\nalert.\r\n\r\nScenario:\r\n\r\n1. The rule type detects 4 alerts (`alert-1`, `alert-2`, `alert-3`,\r\n`alert-4`),\r\nBut reports only the first 3 as the max alerts limit is 3.\r\n\r\n2. `alert-2` becomes recovered, therefore the rule type reports 3 active\r\n(`alert-1`, `alert-3`, `alert-4`), 1 recovered (`alert-2`) alert.\r\n\r\n3. Alerts `alert-1`, `alert-3`, `alert-4` are saved in the task state.\r\n\r\n4. `alert-2` becomes active again (the others are still active)\r\n\r\n5. Rule type reports 3 active alerts (`alert-1`, `alert-2`, `alert-3`)\r\n\r\n6. As a result, the action scheduler tries to schedule actions for\r\n`alert-1`, `alert-3`, `alert-4` as they are the existing alerts.\r\nBut, since the rule type didn't report the `alert-4` it has no scheduled\r\nactions, therefore the action scheduler assumes it is recovered and\r\ntries to schedule a recovery action.\r\n\r\nThis PR changes the actionScheduler to handle active and recovered\r\nalerts separately.\r\nWith this change, no action would be scheduled for an alert from\r\nprevious run (exists in the task state) and isn't reported by the\r\nruleTypeExecutor due to max-alerts-limit but it would be kept in the\r\ntask state.","sha":"dd25bf8807c3ff3982d455f070b6e6c65233662d","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:ResponseOps","v9.0.0","backport:prev-minor"],"title":"Skip scheduling actions for the alerts without scheduledActions","number":195948,"url":"https://github.com/elastic/kibana/pull/195948","mergeCommit":{"message":"Skip scheduling actions for the alerts without scheduledActions (#195948)\n\nResolves: #190258\r\n\r\nAs a result of #190258, we have found out that the odd behaviour happens\r\nwhen an existing alert is pushed above the max alerts limit by a new\r\nalert.\r\n\r\nScenario:\r\n\r\n1. The rule type detects 4 alerts (`alert-1`, `alert-2`, `alert-3`,\r\n`alert-4`),\r\nBut reports only the first 3 as the max alerts limit is 3.\r\n\r\n2. `alert-2` becomes recovered, therefore the rule type reports 3 active\r\n(`alert-1`, `alert-3`, `alert-4`), 1 recovered (`alert-2`) alert.\r\n\r\n3. Alerts `alert-1`, `alert-3`, `alert-4` are saved in the task state.\r\n\r\n4. `alert-2` becomes active again (the others are still active)\r\n\r\n5. Rule type reports 3 active alerts (`alert-1`, `alert-2`, `alert-3`)\r\n\r\n6. As a result, the action scheduler tries to schedule actions for\r\n`alert-1`, `alert-3`, `alert-4` as they are the existing alerts.\r\nBut, since the rule type didn't report the `alert-4` it has no scheduled\r\nactions, therefore the action scheduler assumes it is recovered and\r\ntries to schedule a recovery action.\r\n\r\nThis PR changes the actionScheduler to handle active and recovered\r\nalerts separately.\r\nWith this change, no action would be scheduled for an alert from\r\nprevious run (exists in the task state) and isn't reported by the\r\nruleTypeExecutor due to max-alerts-limit but it would be kept in the\r\ntask state.","sha":"dd25bf8807c3ff3982d455f070b6e6c65233662d"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/195948","number":195948,"mergeCommit":{"message":"Skip scheduling actions for the alerts without scheduledActions (#195948)\n\nResolves: #190258\r\n\r\nAs a result of #190258, we have found out that the odd behaviour happens\r\nwhen an existing alert is pushed above the max alerts limit by a new\r\nalert.\r\n\r\nScenario:\r\n\r\n1. The rule type detects 4 alerts (`alert-1`, `alert-2`, `alert-3`,\r\n`alert-4`),\r\nBut reports only the first 3 as the max alerts limit is 3.\r\n\r\n2. `alert-2` becomes recovered, therefore the rule type reports 3 active\r\n(`alert-1`, `alert-3`, `alert-4`), 1 recovered (`alert-2`) alert.\r\n\r\n3. Alerts `alert-1`, `alert-3`, `alert-4` are saved in the task state.\r\n\r\n4. `alert-2` becomes active again (the others are still active)\r\n\r\n5. Rule type reports 3 active alerts (`alert-1`, `alert-2`, `alert-3`)\r\n\r\n6. As a result, the action scheduler tries to schedule actions for\r\n`alert-1`, `alert-3`, `alert-4` as they are the existing alerts.\r\nBut, since the rule type didn't report the `alert-4` it has no scheduled\r\nactions, therefore the action scheduler assumes it is recovered and\r\ntries to schedule a recovery action.\r\n\r\nThis PR changes the actionScheduler to handle active and recovered\r\nalerts separately.\r\nWith this change, no action would be scheduled for an alert from\r\nprevious run (exists in the task state) and isn't reported by the\r\nruleTypeExecutor due to max-alerts-limit but it would be kept in the\r\ntask state.","sha":"dd25bf8807c3ff3982d455f070b6e6c65233662d"}}]}] BACKPORT--> Co-authored-by: Ersin Erdal <[email protected]>
Resolves: #190258
As a result of #190258, we have found out that the odd behaviour happens when an existing alert is pushed above the max alerts limit by a new alert.
Scenario:
The rule type detects 4 alerts (
alert-1
,alert-2
,alert-3
,alert-4
),But reports only the first 3 as the max alerts limit is 3.
alert-2
becomes recovered, therefore the rule type reports 3 active (alert-1
,alert-3
,alert-4
), 1 recovered (alert-2
) alert.Alerts
alert-1
,alert-3
,alert-4
are saved in the task state.alert-2
becomes active again (the others are still active)Rule type reports 3 active alerts (
alert-1
,alert-2
,alert-3
)As a result, the action scheduler tries to schedule actions for
alert-1
,alert-3
,alert-4
as they are the existing alerts.But, since the rule type didn't report the
alert-4
it has no scheduled actions, therefore the action scheduler assumes it is recovered and tries to schedule a recovery action.This PR changes the actionScheduler to handle active and recovered alerts separately.
With this change, no action would be scheduled for an alert from previous run (exists in the task state) and isn't reported by the ruleTypeExecutor due to max-alerts-limit but it would be kept in the task state.