Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Fetch rule actions in chunks #121110

Merged
merged 1 commit into from
Dec 15, 2021

Conversation

xcrzx
Copy link
Contributor

@xcrzx xcrzx commented Dec 13, 2021

Addresses: #119853

Summary

Adds batching to rule statuses and actions requests. That fixes an issue when many rules (~10000) are requested through the rules/_find API.

Without batching, a certain number of rules/_find requests run in parallel could lead to occasional garbage collector overhead and Elasticsearch OOM errors:

Elasticsearch log
   │ info [o.e.t.LoggingTaskListener] [Dmitriis-MacBook-Pro.local] 253378 finished with response BulkByScrollResponse[took=106ms,timed_out=false,sliceId=null,updated=18,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
   │ info [o.e.t.LoggingTaskListener] [Dmitriis-MacBook-Pro.local] 253368 finished with response BulkByScrollResponse[took=2.9s,timed_out=false,sliceId=null,updated=11349,created=0,deleted=0,batches=12,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
   │ info [o.e.x.i.a.TransportPutLifecycleAction] [Dmitriis-MacBook-Pro.local] updating index lifecycle policy [.alerts-ilm-policy]
   │ info [o.e.x.i.a.TransportPutLifecycleAction] [Dmitriis-MacBook-Pro.local] updating index lifecycle policy [.preview.alerts-security.alerts-policy]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9049] overhead, spent [379ms] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9050] overhead, spent [305ms] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9133] overhead, spent [253ms] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9137] overhead, spent [513ms] collecting in the last [1s]
   │ info [o.e.i.b.HierarchyCircuitBreakerService] [Dmitriis-MacBook-Pro.local] attempting to trigger G1GC due to high heap usage [1554831360]
   │ info [o.e.i.b.HierarchyCircuitBreakerService] [Dmitriis-MacBook-Pro.local] GC did not bring memory usage down, before [1554831360], after [1559109104], allocations [3], duration [12]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9139] overhead, spent [1s] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9140] overhead, spent [786ms] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9141] overhead, spent [273ms] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9142] overhead, spent [727ms] collecting in the last [1.1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9143] overhead, spent [399ms] collecting in the last [1s]
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973807616 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973807784 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.i.b.request] [Dmitriis-MacBook-Pro.local] [request] New used memory 973758464 [928.6mb] for data of [preallocate[aggregations]] would be larger than configured breaker: 966367641 [921.5mb], breaking
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9249] overhead, spent [409ms] collecting in the last [1s]
   │ info [o.e.i.b.HierarchyCircuitBreakerService] [Dmitriis-MacBook-Pro.local] attempting to trigger G1GC due to high heap usage [1563096240]
   │ info [o.e.i.b.HierarchyCircuitBreakerService] [Dmitriis-MacBook-Pro.local] GC did not bring memory usage down, before [1563096240], after [1569245184], allocations [1], duration [120]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9250] overhead, spent [849ms] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9251] overhead, spent [972ms] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9252] overhead, spent [1s] collecting in the last [1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9253] overhead, spent [1s] collecting in the last [1.1s]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9254] overhead, spent [975ms] collecting in the last [1s]
   │ info java.lang.OutOfMemoryError: Java heap space
   │ info Dumping heap to data ...
   │ info [o.e.i.b.HierarchyCircuitBreakerService] [Dmitriis-MacBook-Pro.local] attempting to trigger G1GC due to high heap usage [1600588472]
   │ info [o.e.m.j.JvmGcMonitorService] [Dmitriis-MacBook-Pro.local] [gc][9255] overhead, spent [1.1s] collecting in the last [1.1s]
   │ info Heap dump file created [1694074946 bytes in 2.144 secs]
   │ info Terminating due to java.lang.OutOfMemoryError: Java heap space
   │ERROR ES exited with code 3
error Command failed with exit code 1.

@xcrzx xcrzx added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. auto-backport Deprecated - use backport:version if exact versions are needed v8.1.0 Team:Detection Rule Management Security Detection Rule Management Team labels Dec 13, 2021
@xcrzx xcrzx self-assigned this Dec 13, 2021
@xcrzx xcrzx force-pushed the fix-rule-actions-fetch branch from 42d5c97 to 1fb4412 Compare December 14, 2021 18:50
@xcrzx xcrzx marked this pull request as ready for review December 14, 2021 19:08
@xcrzx xcrzx requested a review from a team as a code owner December 14, 2021 19:08
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Copy link
Contributor

@banderror banderror left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, thank you @xcrzx!

@xcrzx xcrzx force-pushed the fix-rule-actions-fetch branch 3 times, most recently from e0dd259 to 9502e6d Compare December 15, 2021 12:56
@banderror banderror added bug Fixes for quality problems that affect the customer experience Feature:Rule Management Security Solution Detection Rule Management area labels Dec 15, 2021
Copy link
Contributor

@banderror banderror left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 🚀

Thank you for finding the bug, working with the Core folks, making it more resilient on our side, introducing initPromisePool and covering it with tests. That was a lot of work to do!

I noticed a small typo, other than that I'm fine with merging it. Thank you once again.

@xcrzx xcrzx force-pushed the fix-rule-actions-fetch branch from 9502e6d to c8e9036 Compare December 15, 2021 16:58
@xcrzx xcrzx enabled auto-merge (squash) December 15, 2021 17:01
@xcrzx xcrzx merged commit 04cd3af into elastic:main Dec 15, 2021
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

  • 💚 Build #13367 succeeded 9502e6d8360f124bce8503299331be6573554c7f
  • 💔 Build #13356 failed e05a193272cdf13b6d1f6329fcc5116e612bd5e8
  • 💚 Build #13227 succeeded 1fb44120ff6efe68621ba4b923d1eff1b2221c73
  • 💚 Build #12933 succeeded 42d5c97e2517b69957ccd993921275ab6af65c48

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @xcrzx

@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
8.0

This backport PR will be merged automatically after passing CI.

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Dec 15, 2021
kibanamachine added a commit that referenced this pull request Dec 15, 2021
@xcrzx xcrzx deleted the fix-rule-actions-fetch branch December 16, 2021 09:45
TinLe pushed a commit to TinLe/kibana that referenced this pull request Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed bug Fixes for quality problems that affect the customer experience Feature:Rule Management Security Solution Detection Rule Management area release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Security Solution] detection_engine/rules/_find returns 500 when query matches many rules
6 participants