Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Prevent same rule from being imported concurrently #180519

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

thejwuscript
Copy link
Contributor

@thejwuscript thejwuscript commented Apr 10, 2024

Closes issue #177283

Summary

This PR addresses the issue of creating multiple rules with the same rule_id when they are imported concurrently.

The proposed solution does not prevent rule creation per se, but it checks for duplicate rule_id immediately after rule creation and deletes them.

Checklist

  • Unit or functional tests were updated or added to match the most common scenarios
  • Stability of new and changed tests is verified using the Flaky Test Runner in both ESS and Serverless. By default, use 200 runs for ESS and 200 runs for Serverless.
  • Comprehensive manual testing is done by the PR reviewers. Changes are tested in both ESS and Serverless.

For maintainers

@thejwuscript thejwuscript requested a review from a team as a code owner April 10, 2024 18:10
@thejwuscript thejwuscript requested a review from xcrzx April 10, 2024 18:10
@banderror banderror changed the title Prevent same rule from being imported concurrently [Security Solution] Prevent same rule from being imported concurrently Apr 11, 2024
@banderror banderror requested review from maximpn and jpdjere and removed request for xcrzx April 11, 2024 11:06
@banderror banderror added bug Fixes for quality problems that affect the customer experience release_note:fix Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Feature:Rule Import/Export Security Solution Detection Rule Import & Export workflow v8.14.0 labels Apr 11, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@banderror banderror added the impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. label Apr 11, 2024
@banderror
Copy link
Contributor

@thejwuscript Thank you for contributing to Kibana! Our team will review the PR when we get a chance and help push it through the finish line.

@maximpn adding you as reviewer because you're the ticket author and @jpdjere because you're the RFC author that describes upcoming changes to the import endpoint. When you get a chance after completing your current work, please review the PR for correctness and test coverage, and help @thejwuscript finalize it.

@banderror
Copy link
Contributor

/ci

@thejwuscript
Copy link
Contributor Author

@elasticmachine merge upstream

@banderror
Copy link
Contributor

/ci

@kibana-ci
Copy link
Collaborator

kibana-ci commented Apr 13, 2024

💔 Build Failed

Failed CI Steps

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @thejwuscript

@thejwuscript
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@maximpn maximpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thejwuscript thank you for contributing to Kibana 🙏

Sorry for delayed review. The topic is quite complex despite the PR is small and I needed some time to deep dive to check my concerns.

I'm afraid the solution you suggest has pitfalls and doesn't fix the problem entirely while it works in a simple case. It's important to take into account Elasticsearch may work differently under heavy load.

For example two or more rules with the same ruleId might be created concurrently. While each createRule() function uses refresh=wait_for we don't know exact moment when ES returns a response. It may have more than one rule saved when it returns control. After findRules() is invoked and it will contain more than one rule in its response. It will lead that each execution flow will remove its created rule.

On top of that findRules() or deleteRules() may fail with an error.

Even if the suggested solution works it introduces 2 additional round trips to ES cluster which will slows down rules import process.

Generally speaking existing approach and any attempts to modify it won't solve the original problem of creating multiple rules with the same ruleId due to concurrency. The problem could be solved reliably to allowing only sequential rule import. In this case rule import operations wait until the current one is finished. It will allow to use bulk operation to optimise speed and reduce a number of round trips to ES cluster. For better UX it requires creating tasks and processing them under the hood while users will see status of the process. And obviously it requires a substantial refactoring.

});
// OK if it is a unique rule or the earlier rule created
if (rules.total === 1 || createdRule.id === rules.data[0].id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second condition createdRule.id === rules.data[0].id doesn't protect from multiple rules with the same ruleId created. It just checks the first found rule is the same which just has been created.

AND condition would work though

if (rules.total === 1 && createdRule.id === rules.data[0].id) {

rule_id: parsedRule.rule_id,
status_code: 200,
// find rules with the same rule_id, in case of concurrent imports
const rules = await findRules({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rare cases two (or more) rules with the same ruleId can be created in parallel so findRules() would return two rules which will lead to both of them deleted.

});
// else delete the duplicate rule and return status 409
} else {
deleteRules({ ruleId: createdRule.id, rulesClient });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteRules() may fail which will result to multiple rules with the same ruleId.

@thejwuscript
Copy link
Contributor Author

@maximpn thank you for the detailed review and explanation.

If my understanding is correct, sequential rule import is the reliable way to solve the concurrency issue. I’m wondering if the bulk operation is used, does Elasticsearch process each individual index operation sequentially, completing one before moving on to another? Also, if two or more bulk import requests from different users are sent to ES concurrently, would we run into concurrency issue as well?

@maximpn
Copy link
Contributor

maximpn commented Apr 17, 2024

Hi @thejwuscript

If my understanding is correct, sequential rule import is the reliable way to solve the concurrency issue. I’m wondering if the bulk operation is used, does Elasticsearch process each individual index operation sequentially, completing one before moving on to another? Also, if two or more bulk import requests from different users are sent to ES concurrently, would we run into concurrency issue as well?

An answer is multifaceted. I'll try to explain what happens under the hood and what the limitations we face are.

At first Security Rule Management functionality don't and can't interact with ES directly. It goes through clients chain so functions call looks like (method names aren't precisely reflected)

updateRule() -> rulesClient->updateRule() -> savedObjectClient->updateSO() -> esClient->update()

In fact it represents a layered architecture. It's possible to only interact with rulesClient exposed by alerting plugin.

These clients perform some checks under the hood and expose limited APIs. For example rulesClient only allows bulk enable or disable rules but bulk updates aren't allowed at the moment.

When we are talking about rule importing functionality it relies on the rulesClient. The only option it to create or update each rule individually. Performance isn't the best in this approach but we can't simply use ES bulk API here.

We we are talking about concurrency. The best way to solve the duplicated rules issue is to allow only import one ndjson from one user at a time. Rule import API endpoint should create a task and store it in ES. Some process should pick rule import tasks and execute them. It's possible to achieve it via creating tasks for Task Manager. On top of that it'll require adding an API endpoint to control rules import status.

It's also important to take into account that users may create or modify existing rules via UI or API while rules being imported from a ndjson.

Did I answer your question? Let me know if something is unclear or you have any follow up questions.

@xcrzx xcrzx self-requested a review April 25, 2024 08:18
@xcrzx
Copy link
Contributor

xcrzx commented Apr 26, 2024

Hey @thejwuscript, thank you for opening the PR 🎉 It seems to solve the rules duplication problem in many cases. However, we need to assess its robustness and performance more thoroughly before moving forward. We'll have an internal discussion and will get back to you shortly after that with a recommendation on how to move forward with this work.

Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @thejwuscript, we discussed this fix internally, and the consensus was that it might have some performance implications due to additional database roundtrips performed in the loop through the rules being imported. I think we should consider alternative approaches. One of them, mentioned by Maxim, involves implementing a rule creation queue, but that seems like a huge endeavor requiring significant refactoring.

Another option might be to implement rule deduplication using a single aggregation request once the rule import has completed the creation of new rules. It should be possible to run a single term aggregation with min_doc_count to find duplicated rule IDs in one go and then remove the duplicates. This should be a more performant solution and doesn't require significant refactoring.

Please let me know if you are interested in trying to implement this approach or if you have more questions.

@thejwuscript
Copy link
Contributor Author

Hi @xcrzx, thank you for the feedback. I will try implementing the terms aggregation approach.

@banderror banderror marked this pull request as draft May 22, 2024 09:28
@banderror banderror removed the v8.14.0 label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience 💝community Feature:Rule Import/Export Security Solution Detection Rule Import & Export workflow impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. release_note:fix Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants