Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Alerts] Alert suppression time window #148868

Merged
merged 31 commits into from
Jan 30, 2023

Conversation

marshallmain
Copy link
Contributor

@marshallmain marshallmain commented Jan 13, 2023

Summary

Adds ability to specify a time window with alert suppression on Query rules. If more alerts are detected with the same value in the "group by" field in subsequent rule executions, the existing alert will be updated to reflect the new doc count and suppression end time rather than creating a new alert.

Create Rule

image

Rule Details

image

@marshallmain marshallmain added Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Feature:Detection Alerts Security Solution Detection Alerts Feature Team:Detection Alerts Security Detection Alerts Area Team v8.7.0 release_note:enhancement labels Jan 13, 2023
@marshallmain marshallmain marked this pull request as ready for review January 17, 2023 19:47
@marshallmain marshallmain requested review from a team as code owners January 17, 2023 19:47
@marshallmain marshallmain requested a review from spong January 17, 2023 19:47
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting up the PR for rule registry changes! I left a question before continuing my review.

Comment on lines +202 to +205
const suppressionDuration = runOpts.completeRule.ruleParams.alertSuppression?.duration;

if (suppressionDuration) {
const suppressionWindow = `now-${suppressionDuration.value}${suppressionDuration.unit}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the case where a rule not configured with suppression is run, it creates some alerts, then is disabled, edited to have a suppression time period covering the previous execution, and is re-enabled. In this situation the next execution (as triggered by the re-enable) will run with suppression enabled and will suppress any new alerts that would match those alerts from the previous execution, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suppression across rule runs depends on existing alerts having the kibana.alert.instance.id field populated. If the first rule execution has no suppression configured at all, then the initial alerts won't populate an instance ID. When suppression is configured, with or without a duration, the instance ID does get populated (the instance ID calculation depends on the field chosen for suppression and the value in that field, so we'd have to do a runtime computation of each alert's instance ID if the suppression field is allowed to be chosen after the alert is created).

In the scenario you described, after suppression is configured for the first time the next rule execution will still create new alerts even if the previous execution created alerts within the suppression duration with the same host name or whatever field is chosen as the suppression field.

Alternatively, if per rule execution is chosen initially and then the duration is added (without changing the suppression field) then the existing alerts from the per rule execution run would be updated, assuming they're still within the suppression duration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you for the clarification here!

Comment on lines 524 to 526
isDisabled={
groupByFields?.length === 0 ||
groupByRadioSelection.value !== GroupByOptions.PerTimePeriod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops -- one even more corner case here with regards to being on basic license and then duplicating a prebuilt rule that has alert_suppression:

Suggested change
isDisabled={
groupByFields?.length === 0 ||
groupByRadioSelection.value !== GroupByOptions.PerTimePeriod
isDisabled={
!license.isAtLeast(minimumLicenseForSuppression) ||
groupByFields?.length === 0 ||
groupByRadioSelection.value !== GroupByOptions.PerTimePeriod

Copy link
Member

@spong spong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked out, tested locally, and code reviewed -- rules area changes LGTM! 👍

Performed a bunch of different tests, from testing locally hosted prebuilt rules with alert_suppression configured, license changes, and general functionality of the time window feature during rule execution.

Thank you @marshallmain for replying to all my comments and providing further context/test scenarios -- really appreciate it! 🙂 🚀

@spong
Copy link
Member

spong commented Jan 25, 2023

One last UX thought I had with regards to Alert Details -- would be nice if we included the suppression terms and duration under Insights for quickly grokking why these alerts were suppressed:

Copy link
Contributor

@e40pud e40pud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes in the rule registry, looking good! A few questions and comments but after this we should be good to go! 🚀

@@ -48,6 +48,7 @@ const ALERT_BUILDING_BLOCK_TYPE = `${ALERT_NAMESPACE}.building_block_type` as co
const ALERT_EVALUATION_THRESHOLD = `${ALERT_NAMESPACE}.evaluation.threshold` as const;
const ALERT_EVALUATION_VALUE = `${ALERT_NAMESPACE}.evaluation.value` as const;
const ALERT_INSTANCE_ID = `${ALERT_NAMESPACE}.instance.id` as const;
const ALERT_LAST_DETECTED = `${ALERT_NAMESPACE}.last_detected_at` as const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this field to the default_alerts_as_data.ts file to make it available for all alerts and have it renamed to last_detected for consistency with start and end?

return {
...alert,
_source: {
[ALERT_LAST_DETECTED]: includeLastDetected ? currentTimeOverride ?? new Date() : undefined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we always set ALERT_LAST_DETECTED? The framework plans to rely on this field to know when an alert was last detected, even if it was just once (persistent alerts).

throw new Error('Failed to parse suppression window');
}

const suppressionAlertSearchRequest = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this query work with ALERT_START instead of TIMESTAMP? It will make it easier for the framework to enable suppression for O11y and stack alerts too (given timestamp behaves differently in O11y / stack vs security solution) we need to look at the first detected time.

We will need to set ALERT_START for newly detected alerts within augmentAlerts to make this work.

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM! Just one place left to use ALERT_START 🚀

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Security Solution Tests #3 / Inspect Network stats and tables "after each" hook for "inspects the Top DNS Domains Table"
  • [job] [logs] Security Solution Tests #3 / Inspect Network stats and tables inspects the Top DNS Domains Table
  • [job] [logs] Security Solution Tests #3 / Timelines Creates a timeline by clicking untitled timeline from bottom bar can be added notes

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 3563 3565 +2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/rule-data-utils 96 97 +1
ruleRegistry 213 223 +10
total +11

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 12.8MB 12.9MB +20.0KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
ruleRegistry 10 11 +1
Unknown metric groups

API count

id before after diff
@kbn/rule-data-utils 99 100 +1
ruleRegistry 241 251 +10
total +11

ESLint disabled in files

id before after diff
securitySolution 76 77 +1

Total ESLint disabled count

id before after diff
securitySolution 502 503 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@marshallmain marshallmain merged commit 4d353f0 into elastic:main Jan 30, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jan 30, 2023
kqualters-elastic pushed a commit to kqualters-elastic/kibana that referenced this pull request Feb 6, 2023
…8868)

## Summary

Adds ability to specify a time window with alert suppression on Query
rules. If more alerts are detected with the same value in the "group by"
field in subsequent rule executions, the existing alert will be updated
to reflect the new doc count and suppression end time rather than
creating a new alert.

### Create Rule

![image](https://user-images.githubusercontent.com/55718608/212997145-cee96a7d-fc3b-4b08-8845-5a9c7876fa0a.png)

### Rule Details

![image](https://user-images.githubusercontent.com/55718608/212997293-69d93392-f74e-4e4e-925a-befbee531659.png)

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Mike Côté <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Detection Alerts Security Solution Detection Alerts Feature release_note:enhancement Team:Detection Alerts Security Detection Alerts Area Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.7.0
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

8 participants