-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Troubleshooting and Diagnostics of the Detection Engine (Draft) #124947
Labels
epic
Feature:Rule Monitoring
Security Solution Detection Rule Monitoring area
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Comments
banderror
changed the title
[Epic][Security Solution][Detections] Troubleshooting and diagnostics
[Epic][Security Solution][Detections] Troubleshooting and diagnostics (Draft)
Feb 8, 2022
banderror
added
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
labels
Feb 8, 2022
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
banderror
added
epic
Feature:Rule Monitoring
Security Solution Detection Rule Monitoring area
labels
Feb 8, 2022
This was referenced Feb 15, 2022
banderror
added a commit
that referenced
this issue
Jul 25, 2022
…ent Log (#126063) **Epics:** #124947, #118324 **Fixes:** #131352 ## Summary Console logs written by rule executors can now be "routed" to the Event Log in addition to the console. A new table UI for viewing plain rule execution logs allows the user to look at all status changes, errors, warnings, info and debug messages on the Rule Details page. <img width="1502" alt="Screenshot 2022-07-20 at 15 31 54" src="https://user-images.githubusercontent.com/7359339/179995075-24440224-daf9-4e73-bc62-b6ce211052b3.png"> **This feature is hidden under a feature flag and disabled by default** -- it might not be production-ready yet. We will need to work on product and UX/UI design in the subsequent development cycles to make it ready for release. Until then, we can start using it in development: it should make it easier to troubleshoot issues with rule execution. Add this flag to your Kibana config to enable this feature: ```yaml xpack.securitySolution.enableExperimental: ['extendedRuleExecutionLoggingEnabled'] ``` If the flag is enabled: - Rules will start writing console logs to Event Log as events of a new type `message`, in addition to the existing `status-change` and `execution-metrics` events. - Rule Details page will show a new tab called `Execution events`. This tab will contain a table with plain execution logs. - In Stack Management, you will find two new Kibana Advanced Settings for controlling this extended logging. As for the new Kibana Advanced Settings, by default: - Extended logging is enabled. - The minimum console log level to be written to Event Log is `error`. This only affects the new `message` events. <img width="774" alt="Screenshot 2022-07-20 at 15 41 29" src="https://user-images.githubusercontent.com/7359339/179997070-d86dfc6b-3862-49ff-879d-ecc30bc128d7.png"> ## Implementation details **Important change**: refactored the folder structure as our first step to **domain-driven architecture** and **splitting the Detection Engine into subdomains**. - Extracted most of the code related to Rule Execution Log and Rule Monitoring in general into a subdomain called `rule_monitoring`. This subdomain now lives in three folders: - `security_solution/common/detection_engine/rule_monitoring` - `security_solution/public/detection_engine/rule_monitoring` - `security_solution/server/lib/detection_engine/rule_monitoring` - Tried to create a developer-friendly and clear folder structure within the subdomain. Other changes: - Changed all rule executors to write console logs via an instance of `IRuleExecutionLogForExecutors` instead of the console `Logger`. - `IRuleExecutionLogForExecutors` is passed to rule executors and downstream functions they call. - `Logger` is not passed anymore. - `buildRuleMessage` and `buildRuleMessageFactory` are deleted. - Added support for writing console logs to Event Log. - Added a new rule execution event type `message` for writing console logs to Event Log. - Every rule execution event now has a `log.level` and `event.severity`. - Improved the format of console logs written by rules. - Created a child logger for console logs of rule executors: `plugins.securitySolution.ruleExecution`. - Added rule static “signature” ID (`rule.rule_id`) as a correlation id to the logs. - Cleaned up the formatting of console logs. - Fixed `ExtMeta` to use interfaces instead of type intersection due to a found [bug](microsoft/TypeScript#47935) in TypeScript that affected this type. - Made changes in the domain model. - Renamed the `AggregateRuleExecutionEvent` into the `RuleExecutionResult`. - The new plain event is called `RuleExecutionEvent`. - Finalized the API endpoint for fetching plain execution logs. - Built a Rule Execution Events Table UI for showing and filtering plain execution logs. - Did some refactoring to extract reusable components/hooks to make development of tables easier in the future. ## Execution events table UI For context, this is how the existing `Execution logs` table looks like when the flag is off (notice the renaming to `Execution results`: <img width="1506" alt="Screenshot 2022-07-20 at 15 29 04" src="https://user-images.githubusercontent.com/7359339/179994450-45121035-ebb0-4e6f-83c0-9cbbbbd0b598.png"> This is the new `Execution events` table when the flag is on: <img width="1502" alt="Screenshot 2022-07-20 at 15 31 54" src="https://user-images.githubusercontent.com/7359339/179995075-24440224-daf9-4e73-bc62-b6ce211052b3.png"> Showing only trace and debug events: <img width="1505" alt="Screenshot 2022-07-20 at 15 33 48" src="https://user-images.githubusercontent.com/7359339/179995484-d97ff7e3-2756-42db-802f-41f11bd37507.png"> Showing only status changes: <img width="1507" alt="Screenshot 2022-07-20 at 15 35 04" src="https://user-images.githubusercontent.com/7359339/179995804-ca6808b7-3b47-411b-a74e-d141b3fd74e0.png"> Showing only warning and error `message`s: <img width="1508" alt="Screenshot 2022-07-20 at 15 37 11" src="https://user-images.githubusercontent.com/7359339/179996258-c154b95d-642d-45a6-b19a-7185cd71f295.png"> Expanded rows showing details of the corresponding events: <img width="1452" alt="Screenshot 2022-07-20 at 15 39 16" src="https://user-images.githubusercontent.com/7359339/179996771-3954ceea-24e9-4760-9103-2daf6cb7b528.png"> <img width="1449" alt="Screenshot 2022-07-20 at 15 39 56" src="https://user-images.githubusercontent.com/7359339/179996805-c866674d-09a1-42ec-b954-58c6829ef19b.png"> ## Console logs Example: ``` [2022-02-23T17:05:09.901+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Starting Signal Rule execution [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:09.907+03:00][DEBUG][plugins.securitySolution.ruleExecution] interval: 5m [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:09.908+03:00][INFO ][plugins.securitySolution.ruleExecution] Changing rule status to "running" [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:10.595+03:00][WARN ][plugins.securitySolution.ruleExecution] This rule is attempting to query data from Elasticsearch indices listed in the "Index pattern" section of the rule definition, however no index matching: ["logs-endpoint.alerts-*"] was found. This warning will continue to appear until a matching index is created or this rule is de-activated. If you have recently enrolled agents enabled with Endpoint Security through Fleet, this warning should stop once an alert is sent from an agent. [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:10.595+03:00][WARN ][plugins.securitySolution.ruleExecution] Changing rule status to "partial failure" [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:11.630+03:00][DEBUG][plugins.securitySolution.ruleExecution] sortIds: undefined [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:11.634+03:00][DEBUG][plugins.securitySolution.ruleExecution] totalHits: 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:11.634+03:00][DEBUG][plugins.securitySolution.ruleExecution] searchResult.hit.hits.length: 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:11.635+03:00][DEBUG][plugins.securitySolution.ruleExecution] totalHits was 0, exiting early [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:11.636+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] completed bulk index of 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:11.636+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Signal Rule execution completed. [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:11.638+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Finished indexing 0 signals into .alerts-security.alerts [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] [2022-02-23T17:05:11.639+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Finished indexing 0 signals searched between date ranges [ { "to": "2022-02-23T14:05:09.775Z", "from": "2022-02-23T13:55:09.775Z", "maxSignals": 10000 } ] [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default] ``` Note that: - The logger name is now `plugins.securitySolution.ruleExecution`, which allows to turn on _only_ rule execution logs in the config (could be useful when debugging). - Every log message has a suffix with correlation ids: `[siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]` ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] `x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/README.md` - [x] Various JSDoc comments - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [x] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) ### For maintainers - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
banderror
changed the title
[Epic][Security Solution][Detections] Troubleshooting and diagnostics (Draft)
[Security Solution] Troubleshooting and diagnostics (Draft)
Nov 24, 2022
banderror
changed the title
[Security Solution] Troubleshooting and diagnostics (Draft)
[Security Solution] Troubleshooting and Diagnostics of the Detection Engine (Draft)
Nov 24, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
epic
Feature:Rule Monitoring
Security Solution Detection Rule Monitoring area
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Summary
While working on recent SDHs, it became evident that, in contrast to Elasticsearch, Kibana, and Task Manager, we don't have a lot of diag data for Security Solution and Detection Engine. There's not a lot of console logs, rule execution logs stored in
.kibana-event-log-*
, not enough correlation ids in all those logs, support-diagnostics tool does not support dumping anything related to Detection Engine.Plan
Improve logging from rule executors. Write more/better logs with more correlation ids:
siem-detection-engine-rule-execution-info
saved objects. See [RAC][Rule Registry] Add unique ID field for identifying all alerts created during a specific Rule execution #110135Improve logging from route handlers. Write logs with correlation ids from Security Solution's API endpoints:
Referer
HTTP header)NOTE: Correlation ids can be attached to any console log record via an additional
LogMeta
object (example) and are available for slicing and dicing if Kibana logs are ingested to ES. We could potentially leverage this in Cloud.Include correlation ids to outgoing requests to Elasticsearch.
Since we need to analyze
tasks.json
file (generated by support-diagnostics tool) and it's not clear what rule sent a particular search request (and was it even a rule), it would be great if we could attach some correlation ids to requests that we send to Elasticsearch:Maybe it could be done via custom HTTP headers similar to
X-elastic-product-origin
etc that we can see intasks.json
.Measure more rule execution metrics:
NOTE: Detection Engine performance benchmarking could read generic and rule type-specific metrics written to Event Log during the benchmarking and calculate statistics (median, percentiles across all rules, per each rule type, per each rule, etc) as a result.
The text was updated successfully, but these errors were encountered: