[Security Solution] Prevent same rule from being imported concurrently #180519

thejwuscript · 2024-04-10T18:10:26Z

Closes issue #177283

Summary

This PR addresses the issue of creating multiple rules with the same rule_id when they are imported concurrently.

The proposed solution does not prevent rule creation per se, but it checks for duplicate rule_id immediately after rule creation and deletes them.

Checklist

Unit or functional tests were updated or added to match the most common scenarios
Stability of new and changed tests is verified using the Flaky Test Runner in both ESS and Serverless. By default, use 200 runs for ESS and 200 runs for Serverless.
Comprehensive manual testing is done by the PR reviewers. Changes are tested in both ESS and Serverless.

For maintainers

This was checked for breaking API changes and was labeled appropriately

elasticmachine · 2024-04-11T11:07:38Z

Pinging @elastic/security-solution (Team: SecuritySolution)

elasticmachine · 2024-04-11T11:07:38Z

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

elasticmachine · 2024-04-11T11:07:44Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

banderror · 2024-04-11T11:21:20Z

@thejwuscript Thank you for contributing to Kibana! Our team will review the PR when we get a chance and help push it through the finish line.

@maximpn adding you as reviewer because you're the ticket author and @jpdjere because you're the RFC author that describes upcoming changes to the import endpoint. When you get a chance after completing your current work, please review the PR for correctness and test coverage, and help @thejwuscript finalize it.

banderror · 2024-04-11T11:21:34Z

/ci

thejwuscript · 2024-04-11T18:53:39Z

@elasticmachine merge upstream

banderror · 2024-04-13T11:32:02Z

/ci

kibana-ci · 2024-04-13T12:13:03Z

💔 Build Failed

Failed CI Steps

Linting

History

💔 Build #202850 failed 3158cd6

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @thejwuscript

thejwuscript · 2024-04-13T13:19:32Z

@elasticmachine merge upstream

maximpn

@thejwuscript thank you for contributing to Kibana 🙏

Sorry for delayed review. The topic is quite complex despite the PR is small and I needed some time to deep dive to check my concerns.

I'm afraid the solution you suggest has pitfalls and doesn't fix the problem entirely while it works in a simple case. It's important to take into account Elasticsearch may work differently under heavy load.

For example two or more rules with the same ruleId might be created concurrently. While each createRule() function uses refresh=wait_for we don't know exact moment when ES returns a response. It may have more than one rule saved when it returns control. After findRules() is invoked and it will contain more than one rule in its response. It will lead that each execution flow will remove its created rule.

On top of that findRules() or deleteRules() may fail with an error.

Even if the suggested solution works it introduces 2 additional round trips to ES cluster which will slows down rules import process.

Generally speaking existing approach and any attempts to modify it won't solve the original problem of creating multiple rules with the same ruleId due to concurrency. The problem could be solved reliably to allowing only sequential rule import. In this case rule import operations wait until the current one is finished. It will allow to use bulk operation to optimise speed and reduce a number of round trips to ES cluster. For better UX it requires creating tasks and processing them under the hood while users will see status of the process. And obviously it requires a substantial refactoring.

maximpn · 2024-04-16T09:12:23Z

...rity_solution/server/lib/detection_engine/rule_management/logic/import/import_rules_utils.ts

                });
+                // OK if it is a unique rule or the earlier rule created
+                if (rules.total === 1 || createdRule.id === rules.data[0].id) {


The second condition createdRule.id === rules.data[0].id doesn't protect from multiple rules with the same ruleId created. It just checks the first found rule is the same which just has been created.

AND condition would work though

if (rules.total === 1 && createdRule.id === rules.data[0].id) {

maximpn · 2024-04-16T09:20:13Z

...rity_solution/server/lib/detection_engine/rule_management/logic/import/import_rules_utils.ts

-                  rule_id: parsedRule.rule_id,
-                  status_code: 200,
+                // find rules with the same rule_id, in case of concurrent imports
+                const rules = await findRules({


In rare cases two (or more) rules with the same ruleId can be created in parallel so findRules() would return two rules which will lead to both of them deleted.

maximpn · 2024-04-16T09:21:01Z

...rity_solution/server/lib/detection_engine/rule_management/logic/import/import_rules_utils.ts

+                  });
+                  // else delete the duplicate rule and return status 409
+                } else {
+                  deleteRules({ ruleId: createdRule.id, rulesClient });


deleteRules() may fail which will result to multiple rules with the same ruleId.

thejwuscript · 2024-04-17T03:00:09Z

@maximpn thank you for the detailed review and explanation.

If my understanding is correct, sequential rule import is the reliable way to solve the concurrency issue. I’m wondering if the bulk operation is used, does Elasticsearch process each individual index operation sequentially, completing one before moving on to another? Also, if two or more bulk import requests from different users are sent to ES concurrently, would we run into concurrency issue as well?

maximpn · 2024-04-17T11:08:17Z

Hi @thejwuscript

If my understanding is correct, sequential rule import is the reliable way to solve the concurrency issue. I’m wondering if the bulk operation is used, does Elasticsearch process each individual index operation sequentially, completing one before moving on to another? Also, if two or more bulk import requests from different users are sent to ES concurrently, would we run into concurrency issue as well?

An answer is multifaceted. I'll try to explain what happens under the hood and what the limitations we face are.

At first Security Rule Management functionality don't and can't interact with ES directly. It goes through clients chain so functions call looks like (method names aren't precisely reflected)

updateRule() -> rulesClient->updateRule() -> savedObjectClient->updateSO() -> esClient->update()

In fact it represents a layered architecture. It's possible to only interact with rulesClient exposed by alerting plugin.

These clients perform some checks under the hood and expose limited APIs. For example rulesClient only allows bulk enable or disable rules but bulk updates aren't allowed at the moment.

When we are talking about rule importing functionality it relies on the rulesClient. The only option it to create or update each rule individually. Performance isn't the best in this approach but we can't simply use ES bulk API here.

We we are talking about concurrency. The best way to solve the duplicated rules issue is to allow only import one ndjson from one user at a time. Rule import API endpoint should create a task and store it in ES. Some process should pick rule import tasks and execute them. It's possible to achieve it via creating tasks for Task Manager. On top of that it'll require adding an API endpoint to control rules import status.

It's also important to take into account that users may create or modify existing rules via UI or API while rules being imported from a ndjson.

Did I answer your question? Let me know if something is unclear or you have any follow up questions.

xcrzx · 2024-04-26T08:49:19Z

Hey @thejwuscript, thank you for opening the PR 🎉 It seems to solve the rules duplication problem in many cases. However, we need to assess its robustness and performance more thoroughly before moving forward. We'll have an internal discussion and will get back to you shortly after that with a recommendation on how to move forward with this work.

xcrzx

Hey @thejwuscript, we discussed this fix internally, and the consensus was that it might have some performance implications due to additional database roundtrips performed in the loop through the rules being imported. I think we should consider alternative approaches. One of them, mentioned by Maxim, involves implementing a rule creation queue, but that seems like a huge endeavor requiring significant refactoring.

Another option might be to implement rule deduplication using a single aggregation request once the rule import has completed the creation of new rules. It should be possible to run a single term aggregation with min_doc_count to find duplicated rule IDs in one go and then remove the duplicates. This should be a more performant solution and doesn't require significant refactoring.

Please let me know if you are interested in trying to implement this approach or if you have more questions.

thejwuscript · 2024-05-16T15:42:03Z

Hi @xcrzx, thank you for the feedback. I will try implementing the terms aggregation approach.

Prevent same rule_id from concurrent import

3158cd6

thejwuscript requested a review from a team as a code owner April 10, 2024 18:10

thejwuscript requested a review from xcrzx April 10, 2024 18:10

kibanamachine added the 💝community label Apr 10, 2024

banderror changed the title ~~Prevent same rule from being imported concurrently~~ [Security Solution] Prevent same rule from being imported concurrently Apr 11, 2024

banderror requested review from maximpn and jpdjere and removed request for xcrzx April 11, 2024 11:06

banderror assigned thejwuscript Apr 11, 2024

banderror added the impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. label Apr 11, 2024

Fix failing tests

4dd06cc

Merge branch 'main' into concurrent-rule-imports

ec172f7

Delete redundant let declaration

c12bfbd

Merge branch 'main' into concurrent-rule-imports

fe925cd

maximpn reviewed Apr 16, 2024

View reviewed changes

xcrzx self-requested a review April 25, 2024 08:18

xcrzx requested changes May 16, 2024

View reviewed changes

banderror marked this pull request as draft May 22, 2024 09:28

banderror removed the v8.14.0 label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution] Prevent same rule from being imported concurrently #180519

[Security Solution] Prevent same rule from being imported concurrently #180519

thejwuscript commented Apr 10, 2024 •

edited by banderror

Loading

elasticmachine commented Apr 11, 2024

elasticmachine commented Apr 11, 2024

elasticmachine commented Apr 11, 2024

banderror commented Apr 11, 2024

banderror commented Apr 11, 2024

thejwuscript commented Apr 11, 2024

banderror commented Apr 13, 2024

kibana-ci commented Apr 13, 2024 •

edited

Loading

thejwuscript commented Apr 13, 2024

maximpn left a comment

maximpn Apr 16, 2024

maximpn Apr 16, 2024

maximpn Apr 16, 2024

thejwuscript commented Apr 17, 2024

maximpn commented Apr 17, 2024

xcrzx commented Apr 26, 2024

xcrzx left a comment

thejwuscript commented May 16, 2024

[Security Solution] Prevent same rule from being imported concurrently #180519

Are you sure you want to change the base?

[Security Solution] Prevent same rule from being imported concurrently #180519

Conversation

thejwuscript commented Apr 10, 2024 • edited by banderror Loading

Summary

Checklist

For maintainers

elasticmachine commented Apr 11, 2024

elasticmachine commented Apr 11, 2024

elasticmachine commented Apr 11, 2024

banderror commented Apr 11, 2024

banderror commented Apr 11, 2024

thejwuscript commented Apr 11, 2024

banderror commented Apr 13, 2024

kibana-ci commented Apr 13, 2024 • edited Loading

💔 Build Failed

Failed CI Steps

History

thejwuscript commented Apr 13, 2024

maximpn left a comment

Choose a reason for hiding this comment

maximpn Apr 16, 2024

Choose a reason for hiding this comment

maximpn Apr 16, 2024

Choose a reason for hiding this comment

maximpn Apr 16, 2024

Choose a reason for hiding this comment

thejwuscript commented Apr 17, 2024

maximpn commented Apr 17, 2024

xcrzx commented Apr 26, 2024

xcrzx left a comment

Choose a reason for hiding this comment

thejwuscript commented May 16, 2024

thejwuscript commented Apr 10, 2024 •

edited by banderror

Loading

kibana-ci commented Apr 13, 2024 •

edited

Loading