Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps] Add support for the "running" flag to the rule object #147759

Closed
banderror opened this issue Dec 19, 2022 · 4 comments
Closed

[ResponseOps] Add support for the "running" flag to the rule object #147759

banderror opened this issue Dec 19, 2022 · 4 comments
Assignees
Labels
8.7 candidate Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting/RulesManagement Issues related to the Rules Management UX Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.7.0

Comments

@banderror
Copy link
Contributor

banderror commented Dec 19, 2022

Based on: RFC: Consolidating rule statuses for RAC Rule Management / Monitoring (internal)
Depends on: #135127
Related to: #118511

Summary

In Security Solution, we have a dedicated running rule status and show it on the Rule Management and Rule Details pages. This way our users are able to see which rules (or if a given one) are currently running, which becomes especially important for long-running rules. We'd like to preserve this feature.

The RFC: Consolidating rule statuses for RAC Rule Management / Monitoring proposes adding a new running field to the rule object which the Alerting Framework would update to true once the rule starts and to false once it finishes execution. This field hasn't been implemented yet and we need to do that.

Details

There is a performance-related caveat that needs to be taken into account. Updating a saved object can take a long time, in this case updating this running field might take longer than actually running the rule's executor function. See more details in #118511:

Some ideas for handling that efficiently:

  • Only update the running field if the rule runs longer than X (e.g. 2 seconds). Thus, it won't be updated for most of the rules.
  • Don't block the rule's task by updating the running field. Instead, push this update to a stream and process it concurrently (e.g. using rxjs). Debounce running: true additions to the stream by X.
  • A running: false should cancel the previous running: true if the latter has not been handled yet.
  • If the code started handling a running: true and already called saved objects client, running: false should wait and continue after that.
  • running: false should end (close) the stream.
  • The rule's task should wait for the stream to be closed before it returns.
  • Use refresh: false when updating the running field. See also this ticket.
@banderror banderror added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting/RulesManagement Issues related to the Rules Management UX Team:Detection Rule Management Security Detection Rule Management Team 8.7 candidate labels Dec 19, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@XavierM XavierM mentioned this issue Dec 21, 2022
1 task
@banderror banderror assigned maximpn and unassigned jpdjere Dec 29, 2022
XavierM added a commit that referenced this issue Jan 10, 2023
## Summary

#147759


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: kibanamachine <[email protected]>
jennypavlova pushed a commit to jennypavlova/kibana that referenced this issue Jan 13, 2023
## Summary

elastic#147759


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: kibanamachine <[email protected]>
@maximpn
Copy link
Contributor

maximpn commented Jan 23, 2023

I've validated #147896 locally. The implementation covers the task's goal and local testing hasn't revealed any problems. Updating running status happens only after 2 seconds delay and skipped if rule's execution takes less than 2 seconds.

I got the following results

  • Long rule execution with 10s artificial delay (longer 2s)

image

Setting running = true takes around 35ms.

  • Short rule execution (shorter 2s)

image

Setting running = true is skipped altogether.

It's worth to note that updating rule's status happens without waiting for changes propagation which means the index changes will take some time to be visible during reads. According to the ES docs the default index refresh interval is one second.

By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.

What I can confirm locally
image

Overall it means the rule's status should be available with short notice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.7 candidate Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting/RulesManagement Issues related to the Rules Management UX Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.7.0
Projects
None yet
Development

No branches or pull requests

5 participants