Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Rule Execution Log - technical debt #118324

Open
14 of 19 tasks
banderror opened this issue Nov 11, 2021 · 2 comments
Open
14 of 19 tasks

[Security Solution] Rule Execution Log - technical debt #118324

banderror opened this issue Nov 11, 2021 · 2 comments
Labels
epic Feature:Rule Monitoring Security Solution Detection Rule Monitoring area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. technical debt Improvement of the software architecture and operational architecture v8.1.0 v8.2.0 v8.4.0

Comments

@banderror
Copy link
Contributor

banderror commented Nov 11, 2021

Related to: #101013

Summary

As part of #101013 we implemented simple rule execution logging to .kibana-event-log-* indices via the event_log plugin. There is still a lot to do from the tech standpoint, and this ticket tries to capture all the outstanding technical issues.

To do

New data model

Stability and performance

Features

Misc

  • Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods. (PR)
  • Implement a Rule Execution Logger scoped to a particular rule instance. (comment)
    • It would scope common parameters like rule itself, space id etc, and would provide a cleaner and simpler API for rule executors (PR)
    • It would be writing Kibana logs to the system logger in addition to logging execution events to Event Log. Rule executors would be using it instead of the system logger directly. (PR)
  • Add test coverage.
  • Improve Event Log's index bootstrapping logic to simplify development. Currently, in dev mode, if you update mappings of .kibana-event-log* indices, you won't easily get these updates in ES. If we look at x-pack/plugins/event_log/server/es/init.ts, seems like in order to propagate changes in the event log mappings to ES, we need to at least delete the index template, and then either delete the concrete indices or do a manual rollover after KIbana startup. Probably this could be solved via a "force upgrade index" setting of the event_log plugin that would be set to true locally.
@banderror banderror added technical debt Improvement of the software architecture and operational architecture Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. epic Feature:Rule Monitoring Security Solution Detection Rule Monitoring area Team:Detection Rule Management Security Detection Rule Management Team labels Nov 11, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@banderror banderror changed the title [Meta][Security Solution][Detections] Rule Execution Log - technical debt [Epic][Security Solution][Detections] Rule Execution Log - technical debt Nov 24, 2021
banderror added a commit that referenced this issue Jan 20, 2022
)

**Epic:** #118324
**Tickets:** #119603, #119597, #91265, #118511

## Summary

The legacy rule execution logging implementation is replaced by a new one that introduces a new model for execution-related data, a new saved object and a new, cleaner interface and implementation.

- [x] The legacy data model is deleted (`IRuleStatusResponseAttributes`, `IRuleStatusSOAttributes`)
- [x] The legacy `siem-detection-engine-rule-status` saved object type is deleted and marked as deleted in `src/core`
- [x] A new data model is introduced (`x-pack/plugins/security_solution/common/detection_engine/schemas/common/rule_monitoring.ts`). This data model doesn't contain a mixture of successful and failed statuses, which should simplify client-side code (e.g. the code of Rule Management and Monitoring tables, as well as Rule Details page).
- [x] A new `siem-detection-engine-rule-execution-info` saved object is introduced (`x-pack/plugins/security_solution/server/lib/detection_engine/rule_execution_log/rule_execution_info/saved_object.ts`).
  - [x] This SO has 1:1 association with the rule SO, so every rule can have 0 or 1 execution info associated with it. This SO is used in order to 1) update the last execution status and metrics and 2) fetch execution data for N rules more efficiently comparing to the legacy SO.
  - [x] The logic of creating or updating this SOs is based on the "upsert" approach (planned in #118511). It does not fetch the SO by rule id before updating it anymore.
- [x] Rule execution logging logic is rewritten (see `x-pack/plugins/security_solution/server/lib/detection_engine/rule_execution_log`). The previous rule execution log client is split into two objects: `IRuleExecutionLogClient` for using it from route handlers, and `IRuleExecutionLogger` for writing logs from rule executors.
  - [x] `IRuleExecutionLogger` instance is scoped to the currently executing rule and space id. There's no need to pass rule id, name, type etc to `.logStatusChange()` every time.
- [x] Rule executors and related functions are updated.
- [x] API routes are updated, including the rule preview route which uses a special "spy" implementation of `IRuleExecutionLogger`. A rule returned from an API endpoint now has optional `execution_summary` field of type `RuleExecutionSummary`.
- [x] UI is updated to use the new data model of `RuleExecutionSummary`:
  - [x] Rule Management and Monitoring tables
  - [x] Rule Details page
- [x] A new API route is introduced for fetching rule execution events: `/internal/detection_engine/rules/{ruleId}/execution/events`. It is used for rendering the Failure History tab (last 5 failures) and is intended to be used in the coming UI of Rule Execution Log on the Details page.
- [x] Rule Details page and Failure History tab are updated to use the new data models and API routes.
- [x] I used `react-query` for fetching execution events
  - [x] See `x-pack/plugins/security_solution/public/detections/containers/detection_engine/rules/use_rule_execution_events.tsx`
  - [x] The lib is updated to the latest version
- [x] Tests and fixed and updated according to all the changes
- [x] Components related to rule execution statuses are all moved to `x-pack/plugins/security_solution/public/detections/components/rules/rule_execution_status`.
- [x] I left a lot of `// TODO: https://github.com/elastic/kibana/pull/121644` comments in the code which I'm planning to address and remove in a follow-up PR. Lots of clean up work is needed, but I'd like to unblock the work on Rule Execution Log UI.

## In the next episodes

- Address and remove `// TODO: https://github.com/elastic/kibana/pull/121644` comments in the code
- Make sure that SO id generation for `siem-detection-engine-rule-execution-info` is safe and future-proof. Sync with the Core team. If there are risks, we will need to choose between risks and performance (reading the SO before updating it). It would be easy to submit a fix if needed.
- Add APM integration. Use `withSecuritySpan` in methods of `rule_execution_log` citizens.
- Add comments to the code and README.
- Add test coverage.
- Etc...

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [x] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers)

### For maintainers

- [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
@banderror banderror added v8.2.0 8.2 candidate considered, but not committed, for 8.2 release labels Feb 14, 2022
@banderror banderror removed the 8.2 candidate considered, but not committed, for 8.2 release label Apr 25, 2022
banderror added a commit that referenced this issue Jul 25, 2022
…ent Log (#126063)

**Epics:** #124947, #118324
**Fixes:** #131352

## Summary

Console logs written by rule executors can now be "routed" to the Event Log in addition to the console. A new table UI for viewing plain rule execution logs allows the user to look at all status changes, errors, warnings, info and debug messages on the Rule Details page.

<img width="1502" alt="Screenshot 2022-07-20 at 15 31 54" src="https://user-images.githubusercontent.com/7359339/179995075-24440224-daf9-4e73-bc62-b6ce211052b3.png">

**This feature is hidden under a feature flag and disabled by default** -- it might not be production-ready yet. We will need to work on product and UX/UI design in the subsequent development cycles to make it ready for release. Until then, we can start using it in development: it should make it easier to troubleshoot issues with rule execution. Add this flag to your Kibana config to enable this feature:

```yaml
xpack.securitySolution.enableExperimental: ['extendedRuleExecutionLoggingEnabled']
```

If the flag is enabled:

- Rules will start writing console logs to Event Log as events of a new type `message`, in addition to the existing `status-change` and `execution-metrics` events.
- Rule Details page will show a new tab called `Execution events`. This tab will contain a table with plain execution logs.
- In Stack Management, you will find two new Kibana Advanced Settings for controlling this extended logging.

As for the new Kibana Advanced Settings, by default:

- Extended logging is enabled.
- The minimum console log level to be written to Event Log is `error`. This only affects the new `message` events.

<img width="774" alt="Screenshot 2022-07-20 at 15 41 29" src="https://user-images.githubusercontent.com/7359339/179997070-d86dfc6b-3862-49ff-879d-ecc30bc128d7.png">

## Implementation details

**Important change**: refactored the folder structure as our first step to **domain-driven architecture** and **splitting the Detection Engine into subdomains**.

- Extracted most of the code related to Rule Execution Log and Rule Monitoring in general into a subdomain called `rule_monitoring`. This subdomain now lives in three folders:
  - `security_solution/common/detection_engine/rule_monitoring`
  - `security_solution/public/detection_engine/rule_monitoring`
  - `security_solution/server/lib/detection_engine/rule_monitoring`
- Tried to create a developer-friendly and clear folder structure within the subdomain.

Other changes:

- Changed all rule executors to write console logs via an instance of `IRuleExecutionLogForExecutors` instead of the console `Logger`.
  - `IRuleExecutionLogForExecutors` is passed to rule executors and downstream functions they call.
  - `Logger` is not passed anymore.
  - `buildRuleMessage` and `buildRuleMessageFactory` are deleted.
- Added support for writing console logs to Event Log.
  - Added a new rule execution event type `message` for writing console logs to Event Log.
  - Every rule execution event now has a `log.level` and `event.severity`.
- Improved the format of console logs written by rules.
  - Created a child logger for console logs of rule executors: `plugins.securitySolution.ruleExecution`.
  - Added rule static “signature” ID (`rule.rule_id`) as a correlation id to the logs.
  - Cleaned up the formatting of console logs.
  - Fixed `ExtMeta` to use interfaces instead of type intersection due to a found [bug](microsoft/TypeScript#47935) in TypeScript that affected this type.
- Made changes in the domain model.
  - Renamed the `AggregateRuleExecutionEvent` into the `RuleExecutionResult`.
  - The new plain event is called `RuleExecutionEvent`.
- Finalized the API endpoint for fetching plain execution logs.
- Built a Rule Execution Events Table UI for showing and filtering plain execution logs.
  - Did some refactoring to extract reusable components/hooks to make development of tables easier in the future.

## Execution events table UI

For context, this is how the existing `Execution logs` table looks like when the flag is off (notice the renaming to `Execution results`:

<img width="1506" alt="Screenshot 2022-07-20 at 15 29 04" src="https://user-images.githubusercontent.com/7359339/179994450-45121035-ebb0-4e6f-83c0-9cbbbbd0b598.png">

This is the new `Execution events` table when the flag is on:

<img width="1502" alt="Screenshot 2022-07-20 at 15 31 54" src="https://user-images.githubusercontent.com/7359339/179995075-24440224-daf9-4e73-bc62-b6ce211052b3.png">

Showing only trace and debug events:

<img width="1505" alt="Screenshot 2022-07-20 at 15 33 48" src="https://user-images.githubusercontent.com/7359339/179995484-d97ff7e3-2756-42db-802f-41f11bd37507.png">

Showing only status changes:

<img width="1507" alt="Screenshot 2022-07-20 at 15 35 04" src="https://user-images.githubusercontent.com/7359339/179995804-ca6808b7-3b47-411b-a74e-d141b3fd74e0.png">

Showing only warning and error `message`s:

<img width="1508" alt="Screenshot 2022-07-20 at 15 37 11" src="https://user-images.githubusercontent.com/7359339/179996258-c154b95d-642d-45a6-b19a-7185cd71f295.png">

Expanded rows showing details of the corresponding events:

<img width="1452" alt="Screenshot 2022-07-20 at 15 39 16" src="https://user-images.githubusercontent.com/7359339/179996771-3954ceea-24e9-4760-9103-2daf6cb7b528.png">

<img width="1449" alt="Screenshot 2022-07-20 at 15 39 56" src="https://user-images.githubusercontent.com/7359339/179996805-c866674d-09a1-42ec-b954-58c6829ef19b.png">

## Console logs

Example:

```
[2022-02-23T17:05:09.901+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Starting Signal Rule execution [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:09.907+03:00][DEBUG][plugins.securitySolution.ruleExecution] interval: 5m [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:09.908+03:00][INFO ][plugins.securitySolution.ruleExecution] Changing rule status to "running" [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:10.595+03:00][WARN ][plugins.securitySolution.ruleExecution] This rule is attempting to query data from Elasticsearch indices listed in the "Index pattern" section of the rule definition, however no index matching: ["logs-endpoint.alerts-*"] was found. This warning will continue to appear until a matching index is created or this rule is de-activated. If you have recently enrolled agents enabled with Endpoint Security through Fleet, this warning should stop once an alert is sent from an agent. [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:10.595+03:00][WARN ][plugins.securitySolution.ruleExecution] Changing rule status to "partial failure" [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.630+03:00][DEBUG][plugins.securitySolution.ruleExecution] sortIds: undefined [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.634+03:00][DEBUG][plugins.securitySolution.ruleExecution] totalHits: 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.634+03:00][DEBUG][plugins.securitySolution.ruleExecution] searchResult.hit.hits.length: 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.635+03:00][DEBUG][plugins.securitySolution.ruleExecution] totalHits was 0, exiting early [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.636+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] completed bulk index of 0 [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.636+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Signal Rule execution completed. [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.638+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Finished indexing 0 signals into .alerts-security.alerts [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
[2022-02-23T17:05:11.639+03:00][DEBUG][plugins.securitySolution.ruleExecution] [+] Finished indexing 0 signals searched between date ranges [
  {
    "to": "2022-02-23T14:05:09.775Z",
    "from": "2022-02-23T13:55:09.775Z",
    "maxSignals": 10000
  }
] [siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]
```

Note that:

- The logger name is now `plugins.securitySolution.ruleExecution`, which allows to turn on _only_ rule execution logs in the config (could be useful when debugging).
- Every log message has a suffix with correlation ids: `[siem.queryRule][Endpoint Security][rule id 825b2fab-8b3e-11ec-a4a0-cf820453283c][rule uuid 9a1a2dae-0b5f-4c3d-8305-a268d404c306][exec id ebb7f713-b216-4c90-a456-6c1a6815a065][space default]`


### Checklist

- [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials
  - [x] `x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/README.md`
  - [x] Various JSDoc comments
- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [x] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers)

### For maintainers

- [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
@banderror banderror changed the title [Epic][Security Solution][Detections] Rule Execution Log - technical debt [Security Solution] Rule Execution Log - technical debt Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Feature:Rule Monitoring Security Solution Detection Rule Monitoring area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. technical debt Improvement of the software architecture and operational architecture v8.1.0 v8.2.0 v8.4.0
Projects
None yet
Development

No branches or pull requests

3 participants