Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Extended rule execution logging to Event Log #126063

Merged

Conversation

banderror
Copy link
Contributor

@banderror banderror commented Feb 21, 2022

Epics: #124947, #118324
Fixes: #131352

Summary

Console logs written by rule executors can now be "routed" to the Event Log in addition to the console. A new table UI for viewing plain rule execution logs allows the user to look at all status changes, errors, warnings, info and debug messages on the Rule Details page.

Screenshot 2022-07-20 at 15 31 54

This feature is hidden under a feature flag and disabled by default -- it might not be production-ready yet. We will need to work on product and UX/UI design in the subsequent development cycles to make it ready for release. Until then, we can start using it in development: it should make it easier to troubleshoot issues with rule execution. Add this flag to your Kibana config to enable this feature:

xpack.securitySolution.enableExperimental: ['extendedRuleExecutionLoggingEnabled']

If the flag is enabled:

  • Rules will start writing console logs to Event Log as events of a new type message, in addition to the existing status-change and execution-metrics events.
  • Rule Details page will show a new tab called Execution events. This tab will contain a table with plain execution logs.
  • In Stack Management, you will find two new Kibana Advanced Settings for controlling this extended logging.

As for the new Kibana Advanced Settings, by default:

  • Extended logging is enabled.
  • The minimum console log level to be written to Event Log is error. This only affects the new message events.

Screenshot 2022-07-20 at 15 41 29

Implementation details

Important change: refactored the folder structure as our first step to domain-driven architecture and splitting the Detection Engine into subdomains.

  • Extracted most of the code related to Rule Execution Log and Rule Monitoring in general into a subdomain called rule_monitoring. This subdomain now lives in three folders:
    • security_solution/common/detection_engine/rule_monitoring
    • security_solution/public/detection_engine/rule_monitoring
    • security_solution/server/lib/detection_engine/rule_monitoring
  • Tried to create a developer-friendly and clear folder structure within the subdomain.

Other changes:

  • Changed all rule executors to write console logs via an instance of IRuleExecutionLogForExecutors instead of the console Logger.
    • IRuleExecutionLogForExecutors is passed to rule executors and downstream functions they call.
    • Logger is not passed anymore.
    • buildRuleMessage and buildRuleMessageFactory are deleted.
  • Added support for writing console logs to Event Log.
    • Added a new rule execution event type message for writing console logs to Event Log.
    • Every rule execution event now has a log.level and event.severity.
  • Improved the format of console logs written by rules.
    • Created a child logger for console logs of rule executors: plugins.securitySolution.ruleExecution.
    • Added rule static “signature” ID (rule.rule_id) as a correlation id to the logs.
    • Cleaned up the formatting of console logs.
    • Fixed ExtMeta to use interfaces instead of type intersection due to a found bug in TypeScript that affected this type.
  • Made changes in the domain model.
    • Renamed the AggregateRuleExecutionEvent into the RuleExecutionResult.
    • The new plain event is called RuleExecutionEvent.
  • Finalized the API endpoint for fetching plain execution logs.
  • Built a Rule Execution Events Table UI for showing and filtering plain execution logs.
    • Did some refactoring to extract reusable components/hooks to make development of tables easier in the future.

Execution events table UI

For context, this is how the existing Execution logs table looks like when the flag is off (notice the renaming to Execution results:

Screenshot 2022-07-20 at 15 29 04

This is the new Execution events table when the flag is on:

Screenshot 2022-07-20 at 15 31 54

Showing only trace and debug events:

Screenshot 2022-07-20 at 15 33 48

Showing only status changes:

Screenshot 2022-07-20 at 15 35 04

Showing only warning and error messages:

Screenshot 2022-07-20 at 15 37 11

Expanded rows showing details of the corresponding events:

Screenshot 2022-07-20 at 15 39 16

Screenshot 2022-07-20 at 15 39 56

Console logs

Example:

[2022-02-23T17:05:09.901+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Starting Signal Rule execution [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:09.907+03:00][DEBUG][plugins.securitySolution.ruleExecution] interval: 5m [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:09.908+03:00][INFO ][plugins.securitySolution.ruleExecution] Changing rule status to "running" [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:10.595+03:00][WARN ][plugins.securitySolution.ruleExecution] This rule is attempting to query data from Elasticsearch indices listed in the "Index pattern" section of the rule definition, however no index matching: ["logs-endpoint.alerts-*"] was found. This warning will continue to appear until a matching index is created or this rule is de-activated. If you have recently enrolled agents enabled with Endpoint Security through Fleet, this warning should stop once an alert is sent from an agent. [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:10.595+03:00][WARN ][plugins.securitySolution.ruleExecution] Changing rule status to "partial failure" [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.630+03:00][DEBUG][plugins.securitySolution.ruleExecution] sortIds: undefined [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.634+03:00][DEBUG][plugins.securitySolution.ruleExecution] totalHits: 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.634+03:00][DEBUG][plugins.securitySolution.ruleExecution] searchResult.hit.hits.length: 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.635+03:00][DEBUG][plugins.securitySolution.ruleExecution] totalHits was 0, exiting early [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.636+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] completed bulk index of 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.636+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Signal Rule execution completed. [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.638+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Finished indexing 0 signals into .alerts-security.alerts [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.639+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Finished indexing 0 signals searched between date ranges [
  {
    "to": "2022-02-23T14:05:09.775Z",
    "from": "2022-02-23T13:55:09.775Z",
    "maxSignals": 10000
  }
] [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]

Note that:

  • The logger name is now plugins.securitySolution.ruleExecution, which allows to turn on only rule execution logs in the config (could be useful when debugging).
  • Every log message has a suffix with correlation ids: [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]

Checklist

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
    • x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/README.md
    • Various JSDoc comments
  • Unit or functional tests were updated or added to match the most common scenarios
  • Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)
  • Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)
  • This was checked for cross-browser compatibility

For maintainers

@banderror banderror self-assigned this Feb 21, 2022
@banderror banderror added Feature:Rule Monitoring Security Solution Detection Rule Monitoring area Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team v8.2.0 labels Feb 21, 2022
@banderror banderror force-pushed the onweek-troubleshooting-and-diagnostics branch 11 times, most recently from 2c9905f to b618228 Compare February 25, 2022 22:23
@banderror banderror force-pushed the onweek-troubleshooting-and-diagnostics branch 3 times, most recently from 5238efd to f7c2b00 Compare March 10, 2022 12:06
@banderror banderror added the 8.2 candidate considered, but not committed, for 8.2 release label Mar 10, 2022
@banderror banderror force-pushed the onweek-troubleshooting-and-diagnostics branch 6 times, most recently from d7ccc0a to 0a0b14f Compare March 22, 2022 09:25
@banderror banderror force-pushed the onweek-troubleshooting-and-diagnostics branch 2 times, most recently from 92a7283 to 1278e9a Compare March 24, 2022 13:15
@banderror banderror removed the v8.2.0 label Apr 11, 2022
@banderror banderror force-pushed the onweek-troubleshooting-and-diagnostics branch 2 times, most recently from 13283b8 to 597c07d Compare July 25, 2022 13:44
Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments, @banderror 👍

@banderror banderror force-pushed the onweek-troubleshooting-and-diagnostics branch from 597c07d to 9c92cb5 Compare July 25, 2022 17:26
@banderror banderror requested a review from jbudz July 25, 2022 17:27
Copy link
Contributor

@yctercero yctercero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome. Thanks so much for providing the cloud env for testing. Focused on the CTI/platform changes in review which looked like mostly refactor/cleanup.

LGTM 🔥

@ymao1
Copy link
Contributor

ymao1 commented Jul 25, 2022

@banderror The single event log plugin code change looks fine. Tested it out just to see what's being logged and I'm wondering if there is a followup issue to consolidate some of the log messages to take into consideration the fact that they are being written as ES documents now? Understand this is behind a feature flag and it sounds like there are improvements coming. I'm just seeing things like

ruleExecutionLogger.debug('[+] Starting Signal Rule execution');
ruleExecutionLogger.debug(`interval: ${interval}`);

which will show up as 2 event log docs and would like to make sure there are followup efforts to combine the messages (and make them individually more meaningful?)

@banderror
Copy link
Contributor Author

banderror commented Jul 25, 2022

@banderror The single event log plugin code change looks fine. Tested it out just to see what's being logged and I'm wondering if there is a followup issue to consolidate some of the log messages to take into consideration the fact that they are being written as ES documents now? Understand this is behind a feature flag and it sounds like there are improvements coming. I'm just seeing things like

ruleExecutionLogger.debug('[+] Starting Signal Rule execution');
ruleExecutionLogger.debug(`interval: ${interval}`);

which will show up as 2 event log docs and would like to make sure there are followup efforts to combine the messages (and make them individually more meaningful?)

@ymao1 we'll make sure to revisit this before making this feature available for the users. There are no follow-up tickets for this at this point, but I'll write up some before the next planning. I'll make a note to do that so I don't forget.

@banderror banderror force-pushed the onweek-troubleshooting-and-diagnostics branch from 9c92cb5 to 767a8c6 Compare July 25, 2022 19:04
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response ops changes LGTM!

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Security Solution Tests #3 / Inspect Network stats and tables "before all" hook for "inspects the Source IPs Table"

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
lists 304 306 +2
securitySolution 3182 3229 +47
total +49

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 5.4MB 5.5MB +11.7KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 252.3KB 252.8KB +541.0B
Unknown metric groups

API count

id before after diff
@kbn/securitysolution-io-ts-types 54 61 +7

History

  • 💛 Build #60500 was flaky 9c92cb51105048a3ffafa8ab2376296402bbe917
  • 💚 Build #60357 succeeded 597c07dc8916a0aa66095d7644f0640941d58237
  • 💔 Build #60310 failed 2b0e6097b63a93486ee50060838f6d3dc58af59b
  • 💔 Build #60295 failed 133b29f1bdf31d852aa8f1bf7c908b62615b819b
  • 💔 Build #60259 failed ead08438cb2b3e8b70b8f1e17d6fc17b750b256d
  • 💔 Build #60229 failed 8f79cbbf2b49cf86f0ad3a6090c4ce6d76ae71f5

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

@banderror banderror merged commit becaec8 into elastic:main Jul 25, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jul 25, 2022
@dhurley14 dhurley14 deleted the onweek-troubleshooting-and-diagnostics branch July 25, 2022 20:09
@tylersmalley tylersmalley added ci:cloud-deploy Create or update a Cloud deployment and removed ci:deploy-cloud labels Aug 17, 2022
maximpn added a commit that referenced this pull request Sep 19, 2023
**Relates to:** #126063

## Summary

This PR extends rule event log's filter and improves log messages.

## Details

We have Rule execution log feature hidden by a feature flag and disabled, it's shown on a rule details page when enabled.

<img width="1782" alt="image" src="https://github.com/elastic/kibana/assets/3775283/71565d96-13aa-4275-b870-22118ac90335">

The feature is close to a releasable state but some tasks had to be addressed to make it usable. This PR addresses the following tasks to make rule execution log feature releasable

- Adds search bar to search for specific messages

<img width="1529" alt="image" src="https://github.com/elastic/kibana/assets/3775283/4bd198de-60e8-4511-a96d-4d68ec53a7f2">

- Adds a date range filter by default set to show logs for last 24 hours

<img width="1529" alt="image" src="https://github.com/elastic/kibana/assets/3775283/b9d7e658-a19a-402a-a039-28d225000952">

- Improves error, warning and debug messages
- Returns rule metrics in a message as a serialized JSON

<img width="1526" alt="image" src="https://github.com/elastic/kibana/assets/3775283/7d9501b9-4a12-4d31-be99-6ce3c04b2b97">

- Adds `execution_id` to the response

<img width="1522" alt="image" src="https://github.com/elastic/kibana/assets/3775283/92d1291e-0605-456c-abca-8c6fd329ade2">

### Tasks to address later

- [ ] Further improve logging messages. We have either error, warning or debug messages. In fact info or trace levels aren't used but it can give useful info.
- [ ] Add an OpenAPI spec for the rule execution log endpoint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.4 candidate backport:skip This commit does not require backporting ci:cloud-deploy Create or update a Cloud deployment Feature:Rule Monitoring Security Solution Detection Rule Monitoring area release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.4.0
Projects
No open projects
Development

Successfully merging this pull request may close these issues.