-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ResponseOps][Alerting] Decouple feature IDs from consumers #183756
[ResponseOps][Alerting] Decouple feature IDs from consumers #183756
Conversation
/ci |
06d8499
to
3b7c89f
Compare
f65ca2c
to
a2b73f9
Compare
cefbb02
to
1ed4c79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGT Stack Monitoring
]; | ||
|
||
export const apmAlertingRuleTypeIds: string[] = [...OBSERVABILITY_RULE_TYPE_IDS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export const apmAlertingRuleTypeIds: string[] = [...OBSERVABILITY_RULE_TYPE_IDS]; | |
export const apmAlertingRuleTypeIds: string[] = OBSERVABILITY_RULE_TYPE_IDS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OBSERVABILITY_RULE_TYPE_IDS
is used in a lot of places and I wanted to avoid someone mutating the original array. const
protect us with assignments but it does not with push
.
...lity_solution/infra/public/pages/metrics/hosts/components/tabs/alerts/alerts_tab_content.tsx
Outdated
Show resolved
Hide resolved
...observability_solution/infra/public/pages/metrics/hosts/components/tabs/alerts_tab_badge.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did code review only and LGTM. Thanks for taking on the massive changeset. 🙇🏼
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Public APIs missing exports
Page load bundle
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
cc @cnasikas |
Starting backport for target branches: 8.x |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…83756) (#202667) # Backport This will backport the following commits from `main` to `8.x`: - [[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)](#183756) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Christos Nasikas","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-12-03T10:21:53Z","message":"[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)\n\n## Summary\r\n\r\nThis PR aims to decouple the feature IDs from the `consumer` attribute\r\nof rules and alerts.\r\n\r\nTowards: https://github.com/elastic/kibana/issues/187202\r\nFixes: https://github.com/elastic/kibana/issues/181559\r\nFixes: https://github.com/elastic/kibana/issues/182435\r\n\r\n> [!NOTE] \r\n> Unfortunately, I could not break the PR into smaller pieces. The APIs\r\ncould not work anymore with feature IDs and had to convert them to use\r\nrule type IDs. Also, I took the chance and refactored crucial parts of\r\nthe authorization class that in turn affected a lot of files. Most of\r\nthe changes in the files are minimal and easy to review. The crucial\r\nchanges are in the authorization class and some alerting APIs.\r\n\r\n## Architecture\r\n\r\n### Alerting RBAC model\r\n\r\nThe Kibana security uses Elasticsearch's [application\r\nprivileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges).\r\nThis way Kibana can represent and store its privilege models within\r\nElasticsearch roles. To do that, Kibana security creates actions that\r\nare granted by a specific privilege. Alerting uses its own RBAC model\r\nand is built on top of the existing Kibana security model. The Alerting\r\nRBAC uses the `rule_type_id` and `consumer` attributes to define who\r\nowns the rule and the alerts procured by the rule. To connect the\r\n`rule_type_id` and `consumer` with the Kibana security actions the\r\nAlerting RBAC registers its custom actions. They are constructed as\r\n`alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`.\r\nBecause to authorizate a resource an action has to be generated and\r\nbecause the action needs a valid feature ID the value of the `consumer`\r\nshould be a valid feature ID. For example, the\r\n`alerting:siem.esqlRule/siem/rule/get` action, means that a user with a\r\nrole that grants this action can get a rule of type `siem.esqlRule` with\r\nconsumer `siem`.\r\n\r\n### Problem statement\r\n\r\nAt the moment the `consumer` attribute should be a valid feature ID.\r\nThough this approach worked well so far it has its limitation.\r\nSpecifically:\r\n\r\n- Rule types cannot support more than one consumer.\r\n- To associate old rules with a new feature ID required a migration on\r\nthe rule's SOs and the alerts documents.\r\n- The API calls are feature ID-oriented and not rule-type-oriented.\r\n- The framework has to be aware of the values of the `consumer`\r\nattribute.\r\n- Feature IDs are tightly coupled with the alerting indices leading to\r\n[bugs](https://github.com/elastic/kibana/issues/179082).\r\n- Legacy consumers that are not a valid feature anymore can cause\r\n[bugs](https://github.com/elastic/kibana/issues/184595).\r\n- The framework has to be aware of legacy consumers to handle edge\r\ncases.\r\n- The framework has to be aware of specific consumers to handle edge\r\ncases.\r\n\r\n### Proposed solution\r\n\r\nThis PR aims to decouple the feature IDs from consumers. It achieves\r\nthat a) by changing the way solutions configure the alerting privileges\r\nwhen registering a feature and b) by changing the alerting actions. The\r\nschema changes as:\r\n\r\n```\r\n// Old formatting\r\nid: 'siem', <--- feature ID\r\nalerting:['siem.queryRule']\r\n\r\n// New formatting\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nThe new actions are constructed as\r\n`alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For\r\nexample `alerting:rule-type-id/my-consumer/rule/get`. The new action\r\nmeans that a user with a role that grants this action can get a rule of\r\ntype `rule-type` with consumer `my-consumer`. Changing the action\r\nstrings is not considered a breaking change as long as the user's\r\npermission works as before. In our case, this is true because the\r\nconsumer will be the same as before (feature ID), and the alerting\r\nsecurity actions will be the same. For example:\r\n\r\n**Old formatting**\r\n\r\nSchema:\r\n```\r\nid: 'logs', <--- feature ID\r\nalerting:['.es-query'] <-- rule type ID\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get\r\n```\r\n\r\n**New formatting**\r\n\r\nSchema:\r\n```\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before\r\n```\r\n\r\nIn both formating the actions are the same thus breaking changes are\r\navoided.\r\n\r\n### Alerting authorization class\r\nThe alerting plugin uses and exports the alerting authorization class\r\n(`AlertingAuthorization`). The class is responsible for handling all\r\nauthorization actions related to rules and alerts. The class changed to\r\nhandle the new actions as described in the above sections. A lot of\r\nmethods were renamed, removed, and cleaned up, all method arguments\r\nconverted to be an object, and the response signature of some methods\r\nchanged. These changes affected various pieces of the code. The changes\r\nin this class are the most important in this PR especially the\r\n`_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the\r\ncornerstone of the alerting RBAC. Please review carefully.\r\n\r\n### Instantiation of the alerting authorization class\r\nThe `AlertingAuthorizationClientFactory` is used to create instances of\r\nthe `AlertingAuthorization` class. The `AlertingAuthorization` class\r\nneeds to perform async operations upon instantiation. Because JS, at the\r\nmoment, does not support async instantiation of classes the\r\n`AlertingAuthorization` class was assigning `Promise` objects to\r\nvariables that could be resolved later in other phases of the lifecycle\r\nof the class. To improve readability and make the lifecycle of the class\r\nclearer, I separated the construction of the class (initialization) from\r\nthe bootstrap process. As a result, getting the `AlertingAuthorization`\r\nclass or any client that depends on it (`getRulesClient` for example) is\r\nan async operation.\r\n\r\n### Filtering\r\nA lot of routes use the authorization class to get the authorization\r\nfilter (`getFindAuthorizationFilter`), a filter that, if applied,\r\nreturns only the rule types and consumers the user is authorized to. The\r\nmethod that returns the filter was built in a way to also support\r\nfiltering on top of the authorization filter thus coupling the\r\nauthorized filter with router filtering. I believe these two operations\r\nshould be decoupled and the filter method should return a filter that\r\ngives you all the authorized rule types. It is the responsibility of the\r\nconsumer, router in our case, to apply extra filters on top of the\r\nauthorization filter. For that reason, I made all the necessary changes\r\nto decouple them.\r\n\r\n### Legacy consumers & producer\r\nA lot of rules and alerts have been created and are still being created\r\nfrom observability with the `alerts` consumer. When the Alerting RBAC\r\nencounters a rule or alert with `alerts` as a consumer it falls back to\r\nthe `producer` of the rule type ID to construct the actions. For example\r\nif a rule with `ruleTypeId: .es-query` and `consumer: alerts` the\r\nalerting action will be constructed as\r\n`alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the\r\nproducer of the `.es-query` rule type. The `producer` is used to be used\r\nin alerting authorization but due to its complexity, it was deprecated\r\nand only used as a fallback for the `alerts` consumer. To avoid breaking\r\nchanges all feature privileges that specify access to rule types add the\r\n`alerts` consumer when configuring their alerting privileges. By moving\r\nthe `alerts` consumer to the registration of the feature we can stop\r\nrelying on the `producer`. The `producer` is not used anymore in the\r\nauthorization class. In the next PRs the `producer` will removed\r\nentirely.\r\n\r\n### Routes\r\nThe following changes were introduced to the alerting routes:\r\n\r\n- All related routes changed to be rule-type oriented and not feature ID\r\noriented.\r\n- All related routes support the `ruleTypeIds` and the `consumers`\r\nparameters for filtering. In all routes, the filters are constructed as\r\n`ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`.\r\nFiltering by consumers is important. In o11y for example, we do not want\r\nto show ES rule types with the `stackAlerts` consumer even if the user\r\nhas access to them.\r\n- The `/internal/rac/alerts/_feature_ids` route got deleted as it was\r\nnot used anywhere in the codebase and it was internal.\r\n\r\nAll the changes in the routes are related to internal routes and no\r\nbreaking changes are introduced.\r\n\r\n### Constants\r\nI moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and\r\nexported all security solution rule type IDs from\r\n`kbn-securitysolution-rules`. I am not a fan of having a centralized\r\nplace for the rule type IDs. Ideally, consumers of the framework should\r\nspecify keywords like `observablility` (category or subcategory) or even\r\n`apm.*` and the framework should know which rule type IDs to pick up. I\r\nthink it is out of the scope of the PR, and at the moment it seems the\r\nmost straightforward way to move forward. I will try to clean up as much\r\nas possible in further iterations. If you are interested in the upcoming\r\nwork follow this issue https://github.com/elastic/kibana/issues/187202.\r\n\r\n### Other notable code changes\r\n- Change all instances of feature IDs to rule type IDs.\r\n- `isSiemRuleType`: This is a temporary helper function that is needed\r\nin places where we handle edge cases related to security solution rule\r\ntypes. Ideally, the framework should be agnostic to the rule types or\r\nconsumers. The plan is to be removed entirely in further iterations.\r\n- Rename alerting `PluginSetupContract` and `PluginStartContract` to\r\n`AlertingServerSetup` and `AlertingServerStart`. This made me touch a\r\nlot of files but I could not resist.\r\n- `filter_consumers` was mistakenly exposed to a public API. It was\r\nundocumented.\r\n- Files or functions that were not used anywhere in the codebase got\r\ndeleted.\r\n- Change the returned type of the `list` method of the\r\n`RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string,\r\nRegistryRuleType>`.\r\n- Assertion of `KueryNode` in tests changed to an assertion of KQL using\r\n`toKqlExpression`.\r\n- Removal of `useRuleAADFields` as it is not used anywhere.\r\n\r\n## Testing\r\n\r\n> [!CAUTION]\r\n> It is very important to test all the areas of the application where\r\nrules or alerts are being used directly or indirectly. Scenarios to\r\nconsider:\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected as a superuser.\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected by a user with limited access to certain features.\r\n> - The changes in this PR are backward compatible with the previous\r\nusers' permissions.\r\n\r\n### Solutions\r\nPlease test and verify that:\r\n- All the rule types you own with all possible combinations of\r\npermissions both in ESS and in Serverless.\r\n- The consumers and rule types make sense when registering the features.\r\n- The consumers and rule types that are passed to the components are the\r\nintended ones.\r\n\r\n### ResponseOps\r\nThe most important changes are in the alerting authorization class, the\r\nsearch strategy, and the routes. Please test:\r\n- The rules we own with all possible combinations of permissions.\r\n- The stack alerts page and its solution filtering.\r\n- The categories filtering in the maintenance window UI.\r\n\r\n## Risks\r\n> [!WARNING]\r\n> The risks involved in this PR are related to privileges. Specifically:\r\n> - Users with no privileges can access rules and alerts they do not\r\nhave access to.\r\n> - Users with privileges cannot access rules and alerts they have\r\naccess to.\r\n>\r\n> An excessive list of integration tests is in place to ensure that the\r\nabove scenarios will not occur. In the case of a bug, we could a)\r\nrelease an energy release for serverless and b) backport the fix in ESS.\r\nGiven that this PR is intended to be merged in 8.17 we have plenty of\r\ntime to test and to minimize the chances of risks.\r\n\r\n## FQA\r\n\r\n- I noticed that a lot of routes support the `filter` parameter where we\r\ncan pass an arbitrary KQL filter. Why we do not use this to filter by\r\nthe rule type IDs and the consumers and instead we introduce new\r\ndedicated parameters?\r\n\r\nThe `filter` parameter should not be exposed in the first place. It\r\nassumes that the consumer of the API knows the underlying structure and\r\nimplementation details of the persisted storage API (SavedObject client\r\nAPI). For example, a valid filter would be\r\n`alerting.attributes.rule_type_id`. In this filter the consumer should\r\nknow a) the name of the SO b) the keyword `attributes` (storage\r\nimplementation detail) and c) the name of the attribute as it is\r\npersisted in ES (snake case instead of camel case as it is returned by\r\nthe APIs). As there is no abstraction layer between the SO and the API,\r\nit makes it very difficult to make changes in the persistent schema or\r\nthe APIs. For all the above I decided to introduce new query parameters\r\nwhere the alerting framework has total control over it.\r\n\r\n- I noticed in the code a lot of instances where the consumer is used.\r\nShould not remove any logic around consumers?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I noticed a lot of hacks like checking if the rule type is `siem`.\r\nShould not remove the hacks?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I hate the \"Role visibility\" dropdown. Can we remove it?\r\n\r\nI also do not like it. The goal is to remove it. Follow\r\nhttps://github.com//issues/189997.\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <[email protected]>\r\nCo-authored-by: Aleh Zasypkin <[email protected]>\r\nCo-authored-by: Paula Borgonovi <[email protected]>","sha":"a3496c9ca6d99fa301fd8ca73ad44178cdeb2955","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Feature:Alerting","release_note:skip","Team:ResponseOps","v9.0.0","backport:prev-minor","ci:cloud-deploy","ci:cloud-persist-deployment","ci:build-serverless-image","Team:Obs AI Assistant","ci:project-deploy-observability","Team:obs-ux-infra_services","Team:obs-ux-management","apm:review","v8.18.0"],"number":183756,"url":"https://github.com/elastic/kibana/pull/183756","mergeCommit":{"message":"[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)\n\n## Summary\r\n\r\nThis PR aims to decouple the feature IDs from the `consumer` attribute\r\nof rules and alerts.\r\n\r\nTowards: https://github.com/elastic/kibana/issues/187202\r\nFixes: https://github.com/elastic/kibana/issues/181559\r\nFixes: https://github.com/elastic/kibana/issues/182435\r\n\r\n> [!NOTE] \r\n> Unfortunately, I could not break the PR into smaller pieces. The APIs\r\ncould not work anymore with feature IDs and had to convert them to use\r\nrule type IDs. Also, I took the chance and refactored crucial parts of\r\nthe authorization class that in turn affected a lot of files. Most of\r\nthe changes in the files are minimal and easy to review. The crucial\r\nchanges are in the authorization class and some alerting APIs.\r\n\r\n## Architecture\r\n\r\n### Alerting RBAC model\r\n\r\nThe Kibana security uses Elasticsearch's [application\r\nprivileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges).\r\nThis way Kibana can represent and store its privilege models within\r\nElasticsearch roles. To do that, Kibana security creates actions that\r\nare granted by a specific privilege. Alerting uses its own RBAC model\r\nand is built on top of the existing Kibana security model. The Alerting\r\nRBAC uses the `rule_type_id` and `consumer` attributes to define who\r\nowns the rule and the alerts procured by the rule. To connect the\r\n`rule_type_id` and `consumer` with the Kibana security actions the\r\nAlerting RBAC registers its custom actions. They are constructed as\r\n`alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`.\r\nBecause to authorizate a resource an action has to be generated and\r\nbecause the action needs a valid feature ID the value of the `consumer`\r\nshould be a valid feature ID. For example, the\r\n`alerting:siem.esqlRule/siem/rule/get` action, means that a user with a\r\nrole that grants this action can get a rule of type `siem.esqlRule` with\r\nconsumer `siem`.\r\n\r\n### Problem statement\r\n\r\nAt the moment the `consumer` attribute should be a valid feature ID.\r\nThough this approach worked well so far it has its limitation.\r\nSpecifically:\r\n\r\n- Rule types cannot support more than one consumer.\r\n- To associate old rules with a new feature ID required a migration on\r\nthe rule's SOs and the alerts documents.\r\n- The API calls are feature ID-oriented and not rule-type-oriented.\r\n- The framework has to be aware of the values of the `consumer`\r\nattribute.\r\n- Feature IDs are tightly coupled with the alerting indices leading to\r\n[bugs](https://github.com/elastic/kibana/issues/179082).\r\n- Legacy consumers that are not a valid feature anymore can cause\r\n[bugs](https://github.com/elastic/kibana/issues/184595).\r\n- The framework has to be aware of legacy consumers to handle edge\r\ncases.\r\n- The framework has to be aware of specific consumers to handle edge\r\ncases.\r\n\r\n### Proposed solution\r\n\r\nThis PR aims to decouple the feature IDs from consumers. It achieves\r\nthat a) by changing the way solutions configure the alerting privileges\r\nwhen registering a feature and b) by changing the alerting actions. The\r\nschema changes as:\r\n\r\n```\r\n// Old formatting\r\nid: 'siem', <--- feature ID\r\nalerting:['siem.queryRule']\r\n\r\n// New formatting\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nThe new actions are constructed as\r\n`alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For\r\nexample `alerting:rule-type-id/my-consumer/rule/get`. The new action\r\nmeans that a user with a role that grants this action can get a rule of\r\ntype `rule-type` with consumer `my-consumer`. Changing the action\r\nstrings is not considered a breaking change as long as the user's\r\npermission works as before. In our case, this is true because the\r\nconsumer will be the same as before (feature ID), and the alerting\r\nsecurity actions will be the same. For example:\r\n\r\n**Old formatting**\r\n\r\nSchema:\r\n```\r\nid: 'logs', <--- feature ID\r\nalerting:['.es-query'] <-- rule type ID\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get\r\n```\r\n\r\n**New formatting**\r\n\r\nSchema:\r\n```\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before\r\n```\r\n\r\nIn both formating the actions are the same thus breaking changes are\r\navoided.\r\n\r\n### Alerting authorization class\r\nThe alerting plugin uses and exports the alerting authorization class\r\n(`AlertingAuthorization`). The class is responsible for handling all\r\nauthorization actions related to rules and alerts. The class changed to\r\nhandle the new actions as described in the above sections. A lot of\r\nmethods were renamed, removed, and cleaned up, all method arguments\r\nconverted to be an object, and the response signature of some methods\r\nchanged. These changes affected various pieces of the code. The changes\r\nin this class are the most important in this PR especially the\r\n`_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the\r\ncornerstone of the alerting RBAC. Please review carefully.\r\n\r\n### Instantiation of the alerting authorization class\r\nThe `AlertingAuthorizationClientFactory` is used to create instances of\r\nthe `AlertingAuthorization` class. The `AlertingAuthorization` class\r\nneeds to perform async operations upon instantiation. Because JS, at the\r\nmoment, does not support async instantiation of classes the\r\n`AlertingAuthorization` class was assigning `Promise` objects to\r\nvariables that could be resolved later in other phases of the lifecycle\r\nof the class. To improve readability and make the lifecycle of the class\r\nclearer, I separated the construction of the class (initialization) from\r\nthe bootstrap process. As a result, getting the `AlertingAuthorization`\r\nclass or any client that depends on it (`getRulesClient` for example) is\r\nan async operation.\r\n\r\n### Filtering\r\nA lot of routes use the authorization class to get the authorization\r\nfilter (`getFindAuthorizationFilter`), a filter that, if applied,\r\nreturns only the rule types and consumers the user is authorized to. The\r\nmethod that returns the filter was built in a way to also support\r\nfiltering on top of the authorization filter thus coupling the\r\nauthorized filter with router filtering. I believe these two operations\r\nshould be decoupled and the filter method should return a filter that\r\ngives you all the authorized rule types. It is the responsibility of the\r\nconsumer, router in our case, to apply extra filters on top of the\r\nauthorization filter. For that reason, I made all the necessary changes\r\nto decouple them.\r\n\r\n### Legacy consumers & producer\r\nA lot of rules and alerts have been created and are still being created\r\nfrom observability with the `alerts` consumer. When the Alerting RBAC\r\nencounters a rule or alert with `alerts` as a consumer it falls back to\r\nthe `producer` of the rule type ID to construct the actions. For example\r\nif a rule with `ruleTypeId: .es-query` and `consumer: alerts` the\r\nalerting action will be constructed as\r\n`alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the\r\nproducer of the `.es-query` rule type. The `producer` is used to be used\r\nin alerting authorization but due to its complexity, it was deprecated\r\nand only used as a fallback for the `alerts` consumer. To avoid breaking\r\nchanges all feature privileges that specify access to rule types add the\r\n`alerts` consumer when configuring their alerting privileges. By moving\r\nthe `alerts` consumer to the registration of the feature we can stop\r\nrelying on the `producer`. The `producer` is not used anymore in the\r\nauthorization class. In the next PRs the `producer` will removed\r\nentirely.\r\n\r\n### Routes\r\nThe following changes were introduced to the alerting routes:\r\n\r\n- All related routes changed to be rule-type oriented and not feature ID\r\noriented.\r\n- All related routes support the `ruleTypeIds` and the `consumers`\r\nparameters for filtering. In all routes, the filters are constructed as\r\n`ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`.\r\nFiltering by consumers is important. In o11y for example, we do not want\r\nto show ES rule types with the `stackAlerts` consumer even if the user\r\nhas access to them.\r\n- The `/internal/rac/alerts/_feature_ids` route got deleted as it was\r\nnot used anywhere in the codebase and it was internal.\r\n\r\nAll the changes in the routes are related to internal routes and no\r\nbreaking changes are introduced.\r\n\r\n### Constants\r\nI moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and\r\nexported all security solution rule type IDs from\r\n`kbn-securitysolution-rules`. I am not a fan of having a centralized\r\nplace for the rule type IDs. Ideally, consumers of the framework should\r\nspecify keywords like `observablility` (category or subcategory) or even\r\n`apm.*` and the framework should know which rule type IDs to pick up. I\r\nthink it is out of the scope of the PR, and at the moment it seems the\r\nmost straightforward way to move forward. I will try to clean up as much\r\nas possible in further iterations. If you are interested in the upcoming\r\nwork follow this issue https://github.com/elastic/kibana/issues/187202.\r\n\r\n### Other notable code changes\r\n- Change all instances of feature IDs to rule type IDs.\r\n- `isSiemRuleType`: This is a temporary helper function that is needed\r\nin places where we handle edge cases related to security solution rule\r\ntypes. Ideally, the framework should be agnostic to the rule types or\r\nconsumers. The plan is to be removed entirely in further iterations.\r\n- Rename alerting `PluginSetupContract` and `PluginStartContract` to\r\n`AlertingServerSetup` and `AlertingServerStart`. This made me touch a\r\nlot of files but I could not resist.\r\n- `filter_consumers` was mistakenly exposed to a public API. It was\r\nundocumented.\r\n- Files or functions that were not used anywhere in the codebase got\r\ndeleted.\r\n- Change the returned type of the `list` method of the\r\n`RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string,\r\nRegistryRuleType>`.\r\n- Assertion of `KueryNode` in tests changed to an assertion of KQL using\r\n`toKqlExpression`.\r\n- Removal of `useRuleAADFields` as it is not used anywhere.\r\n\r\n## Testing\r\n\r\n> [!CAUTION]\r\n> It is very important to test all the areas of the application where\r\nrules or alerts are being used directly or indirectly. Scenarios to\r\nconsider:\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected as a superuser.\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected by a user with limited access to certain features.\r\n> - The changes in this PR are backward compatible with the previous\r\nusers' permissions.\r\n\r\n### Solutions\r\nPlease test and verify that:\r\n- All the rule types you own with all possible combinations of\r\npermissions both in ESS and in Serverless.\r\n- The consumers and rule types make sense when registering the features.\r\n- The consumers and rule types that are passed to the components are the\r\nintended ones.\r\n\r\n### ResponseOps\r\nThe most important changes are in the alerting authorization class, the\r\nsearch strategy, and the routes. Please test:\r\n- The rules we own with all possible combinations of permissions.\r\n- The stack alerts page and its solution filtering.\r\n- The categories filtering in the maintenance window UI.\r\n\r\n## Risks\r\n> [!WARNING]\r\n> The risks involved in this PR are related to privileges. Specifically:\r\n> - Users with no privileges can access rules and alerts they do not\r\nhave access to.\r\n> - Users with privileges cannot access rules and alerts they have\r\naccess to.\r\n>\r\n> An excessive list of integration tests is in place to ensure that the\r\nabove scenarios will not occur. In the case of a bug, we could a)\r\nrelease an energy release for serverless and b) backport the fix in ESS.\r\nGiven that this PR is intended to be merged in 8.17 we have plenty of\r\ntime to test and to minimize the chances of risks.\r\n\r\n## FQA\r\n\r\n- I noticed that a lot of routes support the `filter` parameter where we\r\ncan pass an arbitrary KQL filter. Why we do not use this to filter by\r\nthe rule type IDs and the consumers and instead we introduce new\r\ndedicated parameters?\r\n\r\nThe `filter` parameter should not be exposed in the first place. It\r\nassumes that the consumer of the API knows the underlying structure and\r\nimplementation details of the persisted storage API (SavedObject client\r\nAPI). For example, a valid filter would be\r\n`alerting.attributes.rule_type_id`. In this filter the consumer should\r\nknow a) the name of the SO b) the keyword `attributes` (storage\r\nimplementation detail) and c) the name of the attribute as it is\r\npersisted in ES (snake case instead of camel case as it is returned by\r\nthe APIs). As there is no abstraction layer between the SO and the API,\r\nit makes it very difficult to make changes in the persistent schema or\r\nthe APIs. For all the above I decided to introduce new query parameters\r\nwhere the alerting framework has total control over it.\r\n\r\n- I noticed in the code a lot of instances where the consumer is used.\r\nShould not remove any logic around consumers?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I noticed a lot of hacks like checking if the rule type is `siem`.\r\nShould not remove the hacks?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I hate the \"Role visibility\" dropdown. Can we remove it?\r\n\r\nI also do not like it. The goal is to remove it. Follow\r\nhttps://github.com//issues/189997.\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <[email protected]>\r\nCo-authored-by: Aleh Zasypkin <[email protected]>\r\nCo-authored-by: Paula Borgonovi <[email protected]>","sha":"a3496c9ca6d99fa301fd8ca73ad44178cdeb2955"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/183756","number":183756,"mergeCommit":{"message":"[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)\n\n## Summary\r\n\r\nThis PR aims to decouple the feature IDs from the `consumer` attribute\r\nof rules and alerts.\r\n\r\nTowards: https://github.com/elastic/kibana/issues/187202\r\nFixes: https://github.com/elastic/kibana/issues/181559\r\nFixes: https://github.com/elastic/kibana/issues/182435\r\n\r\n> [!NOTE] \r\n> Unfortunately, I could not break the PR into smaller pieces. The APIs\r\ncould not work anymore with feature IDs and had to convert them to use\r\nrule type IDs. Also, I took the chance and refactored crucial parts of\r\nthe authorization class that in turn affected a lot of files. Most of\r\nthe changes in the files are minimal and easy to review. The crucial\r\nchanges are in the authorization class and some alerting APIs.\r\n\r\n## Architecture\r\n\r\n### Alerting RBAC model\r\n\r\nThe Kibana security uses Elasticsearch's [application\r\nprivileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges).\r\nThis way Kibana can represent and store its privilege models within\r\nElasticsearch roles. To do that, Kibana security creates actions that\r\nare granted by a specific privilege. Alerting uses its own RBAC model\r\nand is built on top of the existing Kibana security model. The Alerting\r\nRBAC uses the `rule_type_id` and `consumer` attributes to define who\r\nowns the rule and the alerts procured by the rule. To connect the\r\n`rule_type_id` and `consumer` with the Kibana security actions the\r\nAlerting RBAC registers its custom actions. They are constructed as\r\n`alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`.\r\nBecause to authorizate a resource an action has to be generated and\r\nbecause the action needs a valid feature ID the value of the `consumer`\r\nshould be a valid feature ID. For example, the\r\n`alerting:siem.esqlRule/siem/rule/get` action, means that a user with a\r\nrole that grants this action can get a rule of type `siem.esqlRule` with\r\nconsumer `siem`.\r\n\r\n### Problem statement\r\n\r\nAt the moment the `consumer` attribute should be a valid feature ID.\r\nThough this approach worked well so far it has its limitation.\r\nSpecifically:\r\n\r\n- Rule types cannot support more than one consumer.\r\n- To associate old rules with a new feature ID required a migration on\r\nthe rule's SOs and the alerts documents.\r\n- The API calls are feature ID-oriented and not rule-type-oriented.\r\n- The framework has to be aware of the values of the `consumer`\r\nattribute.\r\n- Feature IDs are tightly coupled with the alerting indices leading to\r\n[bugs](https://github.com/elastic/kibana/issues/179082).\r\n- Legacy consumers that are not a valid feature anymore can cause\r\n[bugs](https://github.com/elastic/kibana/issues/184595).\r\n- The framework has to be aware of legacy consumers to handle edge\r\ncases.\r\n- The framework has to be aware of specific consumers to handle edge\r\ncases.\r\n\r\n### Proposed solution\r\n\r\nThis PR aims to decouple the feature IDs from consumers. It achieves\r\nthat a) by changing the way solutions configure the alerting privileges\r\nwhen registering a feature and b) by changing the alerting actions. The\r\nschema changes as:\r\n\r\n```\r\n// Old formatting\r\nid: 'siem', <--- feature ID\r\nalerting:['siem.queryRule']\r\n\r\n// New formatting\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nThe new actions are constructed as\r\n`alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For\r\nexample `alerting:rule-type-id/my-consumer/rule/get`. The new action\r\nmeans that a user with a role that grants this action can get a rule of\r\ntype `rule-type` with consumer `my-consumer`. Changing the action\r\nstrings is not considered a breaking change as long as the user's\r\npermission works as before. In our case, this is true because the\r\nconsumer will be the same as before (feature ID), and the alerting\r\nsecurity actions will be the same. For example:\r\n\r\n**Old formatting**\r\n\r\nSchema:\r\n```\r\nid: 'logs', <--- feature ID\r\nalerting:['.es-query'] <-- rule type ID\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get\r\n```\r\n\r\n**New formatting**\r\n\r\nSchema:\r\n```\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before\r\n```\r\n\r\nIn both formating the actions are the same thus breaking changes are\r\navoided.\r\n\r\n### Alerting authorization class\r\nThe alerting plugin uses and exports the alerting authorization class\r\n(`AlertingAuthorization`). The class is responsible for handling all\r\nauthorization actions related to rules and alerts. The class changed to\r\nhandle the new actions as described in the above sections. A lot of\r\nmethods were renamed, removed, and cleaned up, all method arguments\r\nconverted to be an object, and the response signature of some methods\r\nchanged. These changes affected various pieces of the code. The changes\r\nin this class are the most important in this PR especially the\r\n`_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the\r\ncornerstone of the alerting RBAC. Please review carefully.\r\n\r\n### Instantiation of the alerting authorization class\r\nThe `AlertingAuthorizationClientFactory` is used to create instances of\r\nthe `AlertingAuthorization` class. The `AlertingAuthorization` class\r\nneeds to perform async operations upon instantiation. Because JS, at the\r\nmoment, does not support async instantiation of classes the\r\n`AlertingAuthorization` class was assigning `Promise` objects to\r\nvariables that could be resolved later in other phases of the lifecycle\r\nof the class. To improve readability and make the lifecycle of the class\r\nclearer, I separated the construction of the class (initialization) from\r\nthe bootstrap process. As a result, getting the `AlertingAuthorization`\r\nclass or any client that depends on it (`getRulesClient` for example) is\r\nan async operation.\r\n\r\n### Filtering\r\nA lot of routes use the authorization class to get the authorization\r\nfilter (`getFindAuthorizationFilter`), a filter that, if applied,\r\nreturns only the rule types and consumers the user is authorized to. The\r\nmethod that returns the filter was built in a way to also support\r\nfiltering on top of the authorization filter thus coupling the\r\nauthorized filter with router filtering. I believe these two operations\r\nshould be decoupled and the filter method should return a filter that\r\ngives you all the authorized rule types. It is the responsibility of the\r\nconsumer, router in our case, to apply extra filters on top of the\r\nauthorization filter. For that reason, I made all the necessary changes\r\nto decouple them.\r\n\r\n### Legacy consumers & producer\r\nA lot of rules and alerts have been created and are still being created\r\nfrom observability with the `alerts` consumer. When the Alerting RBAC\r\nencounters a rule or alert with `alerts` as a consumer it falls back to\r\nthe `producer` of the rule type ID to construct the actions. For example\r\nif a rule with `ruleTypeId: .es-query` and `consumer: alerts` the\r\nalerting action will be constructed as\r\n`alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the\r\nproducer of the `.es-query` rule type. The `producer` is used to be used\r\nin alerting authorization but due to its complexity, it was deprecated\r\nand only used as a fallback for the `alerts` consumer. To avoid breaking\r\nchanges all feature privileges that specify access to rule types add the\r\n`alerts` consumer when configuring their alerting privileges. By moving\r\nthe `alerts` consumer to the registration of the feature we can stop\r\nrelying on the `producer`. The `producer` is not used anymore in the\r\nauthorization class. In the next PRs the `producer` will removed\r\nentirely.\r\n\r\n### Routes\r\nThe following changes were introduced to the alerting routes:\r\n\r\n- All related routes changed to be rule-type oriented and not feature ID\r\noriented.\r\n- All related routes support the `ruleTypeIds` and the `consumers`\r\nparameters for filtering. In all routes, the filters are constructed as\r\n`ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`.\r\nFiltering by consumers is important. In o11y for example, we do not want\r\nto show ES rule types with the `stackAlerts` consumer even if the user\r\nhas access to them.\r\n- The `/internal/rac/alerts/_feature_ids` route got deleted as it was\r\nnot used anywhere in the codebase and it was internal.\r\n\r\nAll the changes in the routes are related to internal routes and no\r\nbreaking changes are introduced.\r\n\r\n### Constants\r\nI moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and\r\nexported all security solution rule type IDs from\r\n`kbn-securitysolution-rules`. I am not a fan of having a centralized\r\nplace for the rule type IDs. Ideally, consumers of the framework should\r\nspecify keywords like `observablility` (category or subcategory) or even\r\n`apm.*` and the framework should know which rule type IDs to pick up. I\r\nthink it is out of the scope of the PR, and at the moment it seems the\r\nmost straightforward way to move forward. I will try to clean up as much\r\nas possible in further iterations. If you are interested in the upcoming\r\nwork follow this issue https://github.com/elastic/kibana/issues/187202.\r\n\r\n### Other notable code changes\r\n- Change all instances of feature IDs to rule type IDs.\r\n- `isSiemRuleType`: This is a temporary helper function that is needed\r\nin places where we handle edge cases related to security solution rule\r\ntypes. Ideally, the framework should be agnostic to the rule types or\r\nconsumers. The plan is to be removed entirely in further iterations.\r\n- Rename alerting `PluginSetupContract` and `PluginStartContract` to\r\n`AlertingServerSetup` and `AlertingServerStart`. This made me touch a\r\nlot of files but I could not resist.\r\n- `filter_consumers` was mistakenly exposed to a public API. It was\r\nundocumented.\r\n- Files or functions that were not used anywhere in the codebase got\r\ndeleted.\r\n- Change the returned type of the `list` method of the\r\n`RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string,\r\nRegistryRuleType>`.\r\n- Assertion of `KueryNode` in tests changed to an assertion of KQL using\r\n`toKqlExpression`.\r\n- Removal of `useRuleAADFields` as it is not used anywhere.\r\n\r\n## Testing\r\n\r\n> [!CAUTION]\r\n> It is very important to test all the areas of the application where\r\nrules or alerts are being used directly or indirectly. Scenarios to\r\nconsider:\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected as a superuser.\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected by a user with limited access to certain features.\r\n> - The changes in this PR are backward compatible with the previous\r\nusers' permissions.\r\n\r\n### Solutions\r\nPlease test and verify that:\r\n- All the rule types you own with all possible combinations of\r\npermissions both in ESS and in Serverless.\r\n- The consumers and rule types make sense when registering the features.\r\n- The consumers and rule types that are passed to the components are the\r\nintended ones.\r\n\r\n### ResponseOps\r\nThe most important changes are in the alerting authorization class, the\r\nsearch strategy, and the routes. Please test:\r\n- The rules we own with all possible combinations of permissions.\r\n- The stack alerts page and its solution filtering.\r\n- The categories filtering in the maintenance window UI.\r\n\r\n## Risks\r\n> [!WARNING]\r\n> The risks involved in this PR are related to privileges. Specifically:\r\n> - Users with no privileges can access rules and alerts they do not\r\nhave access to.\r\n> - Users with privileges cannot access rules and alerts they have\r\naccess to.\r\n>\r\n> An excessive list of integration tests is in place to ensure that the\r\nabove scenarios will not occur. In the case of a bug, we could a)\r\nrelease an energy release for serverless and b) backport the fix in ESS.\r\nGiven that this PR is intended to be merged in 8.17 we have plenty of\r\ntime to test and to minimize the chances of risks.\r\n\r\n## FQA\r\n\r\n- I noticed that a lot of routes support the `filter` parameter where we\r\ncan pass an arbitrary KQL filter. Why we do not use this to filter by\r\nthe rule type IDs and the consumers and instead we introduce new\r\ndedicated parameters?\r\n\r\nThe `filter` parameter should not be exposed in the first place. It\r\nassumes that the consumer of the API knows the underlying structure and\r\nimplementation details of the persisted storage API (SavedObject client\r\nAPI). For example, a valid filter would be\r\n`alerting.attributes.rule_type_id`. In this filter the consumer should\r\nknow a) the name of the SO b) the keyword `attributes` (storage\r\nimplementation detail) and c) the name of the attribute as it is\r\npersisted in ES (snake case instead of camel case as it is returned by\r\nthe APIs). As there is no abstraction layer between the SO and the API,\r\nit makes it very difficult to make changes in the persistent schema or\r\nthe APIs. For all the above I decided to introduce new query parameters\r\nwhere the alerting framework has total control over it.\r\n\r\n- I noticed in the code a lot of instances where the consumer is used.\r\nShould not remove any logic around consumers?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I noticed a lot of hacks like checking if the rule type is `siem`.\r\nShould not remove the hacks?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I hate the \"Role visibility\" dropdown. Can we remove it?\r\n\r\nI also do not like it. The goal is to remove it. Follow\r\nhttps://github.com//issues/189997.\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <[email protected]>\r\nCo-authored-by: Aleh Zasypkin <[email protected]>\r\nCo-authored-by: Paula Borgonovi <[email protected]>","sha":"a3496c9ca6d99fa301fd8ca73ad44178cdeb2955"}},{"branch":"8.x","label":"v8.18.0","labelRegex":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
…183756) ## Summary This PR aims to decouple the feature IDs from the `consumer` attribute of rules and alerts. Towards: elastic#187202 Fixes: elastic#181559 Fixes: elastic#182435 > [!NOTE] > Unfortunately, I could not break the PR into smaller pieces. The APIs could not work anymore with feature IDs and had to convert them to use rule type IDs. Also, I took the chance and refactored crucial parts of the authorization class that in turn affected a lot of files. Most of the changes in the files are minimal and easy to review. The crucial changes are in the authorization class and some alerting APIs. ## Architecture ### Alerting RBAC model The Kibana security uses Elasticsearch's [application privileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges). This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the `rule_type_id` and `consumer` attributes to define who owns the rule and the alerts procured by the rule. To connect the `rule_type_id` and `consumer` with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed as `alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`. Because to authorizate a resource an action has to be generated and because the action needs a valid feature ID the value of the `consumer` should be a valid feature ID. For example, the `alerting:siem.esqlRule/siem/rule/get` action, means that a user with a role that grants this action can get a rule of type `siem.esqlRule` with consumer `siem`. ### Problem statement At the moment the `consumer` attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically: - Rule types cannot support more than one consumer. - To associate old rules with a new feature ID required a migration on the rule's SOs and the alerts documents. - The API calls are feature ID-oriented and not rule-type-oriented. - The framework has to be aware of the values of the `consumer` attribute. - Feature IDs are tightly coupled with the alerting indices leading to [bugs](elastic#179082). - Legacy consumers that are not a valid feature anymore can cause [bugs](elastic#184595). - The framework has to be aware of legacy consumers to handle edge cases. - The framework has to be aware of specific consumers to handle edge cases. ### Proposed solution This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as: ``` // Old formatting id: 'siem', <--- feature ID alerting:['siem.queryRule'] // New formatting id: 'siem', <--- feature ID alerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting ``` The new actions are constructed as `alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For example `alerting:rule-type-id/my-consumer/rule/get`. The new action means that a user with a role that grants this action can get a rule of type `rule-type` with consumer `my-consumer`. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example: **Old formatting** Schema: ``` id: 'logs', <--- feature ID alerting:['.es-query'] <-- rule type ID ``` Generated action: ``` alerting:.es-query/logs/rule/get ``` **New formatting** Schema: ``` id: 'siem', <--- feature ID alerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting ``` Generated action: ``` alerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before ``` In both formating the actions are the same thus breaking changes are avoided. ### Alerting authorization class The alerting plugin uses and exports the alerting authorization class (`AlertingAuthorization`). The class is responsible for handling all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, and cleaned up, all method arguments converted to be an object, and the response signature of some methods changed. These changes affected various pieces of the code. The changes in this class are the most important in this PR especially the `_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the cornerstone of the alerting RBAC. Please review carefully. ### Instantiation of the alerting authorization class The `AlertingAuthorizationClientFactory` is used to create instances of the `AlertingAuthorization` class. The `AlertingAuthorization` class needs to perform async operations upon instantiation. Because JS, at the moment, does not support async instantiation of classes the `AlertingAuthorization` class was assigning `Promise` objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve readability and make the lifecycle of the class clearer, I separated the construction of the class (initialization) from the bootstrap process. As a result, getting the `AlertingAuthorization` class or any client that depends on it (`getRulesClient` for example) is an async operation. ### Filtering A lot of routes use the authorization class to get the authorization filter (`getFindAuthorizationFilter`), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was built in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with router filtering. I believe these two operations should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, router in our case, to apply extra filters on top of the authorization filter. For that reason, I made all the necessary changes to decouple them. ### Legacy consumers & producer A lot of rules and alerts have been created and are still being created from observability with the `alerts` consumer. When the Alerting RBAC encounters a rule or alert with `alerts` as a consumer it falls back to the `producer` of the rule type ID to construct the actions. For example if a rule with `ruleTypeId: .es-query` and `consumer: alerts` the alerting action will be constructed as `alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the producer of the `.es-query` rule type. The `producer` is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for the `alerts` consumer. To avoid breaking changes all feature privileges that specify access to rule types add the `alerts` consumer when configuring their alerting privileges. By moving the `alerts` consumer to the registration of the feature we can stop relying on the `producer`. The `producer` is not used anymore in the authorization class. In the next PRs the `producer` will removed entirely. ### Routes The following changes were introduced to the alerting routes: - All related routes changed to be rule-type oriented and not feature ID oriented. - All related routes support the `ruleTypeIds` and the `consumers` parameters for filtering. In all routes, the filters are constructed as `ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`. Filtering by consumers is important. In o11y for example, we do not want to show ES rule types with the `stackAlerts` consumer even if the user has access to them. - The `/internal/rac/alerts/_feature_ids` route got deleted as it was not used anywhere in the codebase and it was internal. All the changes in the routes are related to internal routes and no breaking changes are introduced. ### Constants I moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and exported all security solution rule type IDs from `kbn-securitysolution-rules`. I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords like `observablility` (category or subcategory) or even `apm.*` and the framework should know which rule type IDs to pick up. I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward. I will try to clean up as much as possible in further iterations. If you are interested in the upcoming work follow this issue elastic#187202. ### Other notable code changes - Change all instances of feature IDs to rule type IDs. - `isSiemRuleType`: This is a temporary helper function that is needed in places where we handle edge cases related to security solution rule types. Ideally, the framework should be agnostic to the rule types or consumers. The plan is to be removed entirely in further iterations. - Rename alerting `PluginSetupContract` and `PluginStartContract` to `AlertingServerSetup` and `AlertingServerStart`. This made me touch a lot of files but I could not resist. - `filter_consumers` was mistakenly exposed to a public API. It was undocumented. - Files or functions that were not used anywhere in the codebase got deleted. - Change the returned type of the `list` method of the `RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string, RegistryRuleType>`. - Assertion of `KueryNode` in tests changed to an assertion of KQL using `toKqlExpression`. - Removal of `useRuleAADFields` as it is not used anywhere. ## Testing > [!CAUTION] > It is very important to test all the areas of the application where rules or alerts are being used directly or indirectly. Scenarios to consider: > - The correct rules, alerts, and aggregations on top of them are being shown as expected as a superuser. > - The correct rules, alerts, and aggregations on top of them are being shown as expected by a user with limited access to certain features. > - The changes in this PR are backward compatible with the previous users' permissions. ### Solutions Please test and verify that: - All the rule types you own with all possible combinations of permissions both in ESS and in Serverless. - The consumers and rule types make sense when registering the features. - The consumers and rule types that are passed to the components are the intended ones. ### ResponseOps The most important changes are in the alerting authorization class, the search strategy, and the routes. Please test: - The rules we own with all possible combinations of permissions. - The stack alerts page and its solution filtering. - The categories filtering in the maintenance window UI. ## Risks > [!WARNING] > The risks involved in this PR are related to privileges. Specifically: > - Users with no privileges can access rules and alerts they do not have access to. > - Users with privileges cannot access rules and alerts they have access to. > > An excessive list of integration tests is in place to ensure that the above scenarios will not occur. In the case of a bug, we could a) release an energy release for serverless and b) backport the fix in ESS. Given that this PR is intended to be merged in 8.17 we have plenty of time to test and to minimize the chances of risks. ## FQA - I noticed that a lot of routes support the `filter` parameter where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated parameters? The `filter` parameter should not be exposed in the first place. It assumes that the consumer of the API knows the underlying structure and implementation details of the persisted storage API (SavedObject client API). For example, a valid filter would be `alerting.attributes.rule_type_id`. In this filter the consumer should know a) the name of the SO b) the keyword `attributes` (storage implementation detail) and c) the name of the attribute as it is persisted in ES (snake case instead of camel case as it is returned by the APIs). As there is no abstraction layer between the SO and the API, it makes it very difficult to make changes in the persistent schema or the APIs. For all the above I decided to introduce new query parameters where the alerting framework has total control over it. - I noticed in the code a lot of instances where the consumer is used. Should not remove any logic around consumers? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I noticed a lot of hacks like checking if the rule type is `siem`. Should not remove the hacks? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I hate the "Role visibility" dropdown. Can we remove it? I also do not like it. The goal is to remove it. Follow elastic#189997. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Aleh Zasypkin <[email protected]> Co-authored-by: Paula Borgonovi <[email protected]>
…183756) ## Summary This PR aims to decouple the feature IDs from the `consumer` attribute of rules and alerts. Towards: elastic#187202 Fixes: elastic#181559 Fixes: elastic#182435 > [!NOTE] > Unfortunately, I could not break the PR into smaller pieces. The APIs could not work anymore with feature IDs and had to convert them to use rule type IDs. Also, I took the chance and refactored crucial parts of the authorization class that in turn affected a lot of files. Most of the changes in the files are minimal and easy to review. The crucial changes are in the authorization class and some alerting APIs. ## Architecture ### Alerting RBAC model The Kibana security uses Elasticsearch's [application privileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges). This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the `rule_type_id` and `consumer` attributes to define who owns the rule and the alerts procured by the rule. To connect the `rule_type_id` and `consumer` with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed as `alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`. Because to authorizate a resource an action has to be generated and because the action needs a valid feature ID the value of the `consumer` should be a valid feature ID. For example, the `alerting:siem.esqlRule/siem/rule/get` action, means that a user with a role that grants this action can get a rule of type `siem.esqlRule` with consumer `siem`. ### Problem statement At the moment the `consumer` attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically: - Rule types cannot support more than one consumer. - To associate old rules with a new feature ID required a migration on the rule's SOs and the alerts documents. - The API calls are feature ID-oriented and not rule-type-oriented. - The framework has to be aware of the values of the `consumer` attribute. - Feature IDs are tightly coupled with the alerting indices leading to [bugs](elastic#179082). - Legacy consumers that are not a valid feature anymore can cause [bugs](elastic#184595). - The framework has to be aware of legacy consumers to handle edge cases. - The framework has to be aware of specific consumers to handle edge cases. ### Proposed solution This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as: ``` // Old formatting id: 'siem', <--- feature ID alerting:['siem.queryRule'] // New formatting id: 'siem', <--- feature ID alerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting ``` The new actions are constructed as `alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For example `alerting:rule-type-id/my-consumer/rule/get`. The new action means that a user with a role that grants this action can get a rule of type `rule-type` with consumer `my-consumer`. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example: **Old formatting** Schema: ``` id: 'logs', <--- feature ID alerting:['.es-query'] <-- rule type ID ``` Generated action: ``` alerting:.es-query/logs/rule/get ``` **New formatting** Schema: ``` id: 'siem', <--- feature ID alerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting ``` Generated action: ``` alerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before ``` In both formating the actions are the same thus breaking changes are avoided. ### Alerting authorization class The alerting plugin uses and exports the alerting authorization class (`AlertingAuthorization`). The class is responsible for handling all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, and cleaned up, all method arguments converted to be an object, and the response signature of some methods changed. These changes affected various pieces of the code. The changes in this class are the most important in this PR especially the `_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the cornerstone of the alerting RBAC. Please review carefully. ### Instantiation of the alerting authorization class The `AlertingAuthorizationClientFactory` is used to create instances of the `AlertingAuthorization` class. The `AlertingAuthorization` class needs to perform async operations upon instantiation. Because JS, at the moment, does not support async instantiation of classes the `AlertingAuthorization` class was assigning `Promise` objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve readability and make the lifecycle of the class clearer, I separated the construction of the class (initialization) from the bootstrap process. As a result, getting the `AlertingAuthorization` class or any client that depends on it (`getRulesClient` for example) is an async operation. ### Filtering A lot of routes use the authorization class to get the authorization filter (`getFindAuthorizationFilter`), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was built in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with router filtering. I believe these two operations should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, router in our case, to apply extra filters on top of the authorization filter. For that reason, I made all the necessary changes to decouple them. ### Legacy consumers & producer A lot of rules and alerts have been created and are still being created from observability with the `alerts` consumer. When the Alerting RBAC encounters a rule or alert with `alerts` as a consumer it falls back to the `producer` of the rule type ID to construct the actions. For example if a rule with `ruleTypeId: .es-query` and `consumer: alerts` the alerting action will be constructed as `alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the producer of the `.es-query` rule type. The `producer` is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for the `alerts` consumer. To avoid breaking changes all feature privileges that specify access to rule types add the `alerts` consumer when configuring their alerting privileges. By moving the `alerts` consumer to the registration of the feature we can stop relying on the `producer`. The `producer` is not used anymore in the authorization class. In the next PRs the `producer` will removed entirely. ### Routes The following changes were introduced to the alerting routes: - All related routes changed to be rule-type oriented and not feature ID oriented. - All related routes support the `ruleTypeIds` and the `consumers` parameters for filtering. In all routes, the filters are constructed as `ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`. Filtering by consumers is important. In o11y for example, we do not want to show ES rule types with the `stackAlerts` consumer even if the user has access to them. - The `/internal/rac/alerts/_feature_ids` route got deleted as it was not used anywhere in the codebase and it was internal. All the changes in the routes are related to internal routes and no breaking changes are introduced. ### Constants I moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and exported all security solution rule type IDs from `kbn-securitysolution-rules`. I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords like `observablility` (category or subcategory) or even `apm.*` and the framework should know which rule type IDs to pick up. I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward. I will try to clean up as much as possible in further iterations. If you are interested in the upcoming work follow this issue elastic#187202. ### Other notable code changes - Change all instances of feature IDs to rule type IDs. - `isSiemRuleType`: This is a temporary helper function that is needed in places where we handle edge cases related to security solution rule types. Ideally, the framework should be agnostic to the rule types or consumers. The plan is to be removed entirely in further iterations. - Rename alerting `PluginSetupContract` and `PluginStartContract` to `AlertingServerSetup` and `AlertingServerStart`. This made me touch a lot of files but I could not resist. - `filter_consumers` was mistakenly exposed to a public API. It was undocumented. - Files or functions that were not used anywhere in the codebase got deleted. - Change the returned type of the `list` method of the `RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string, RegistryRuleType>`. - Assertion of `KueryNode` in tests changed to an assertion of KQL using `toKqlExpression`. - Removal of `useRuleAADFields` as it is not used anywhere. ## Testing > [!CAUTION] > It is very important to test all the areas of the application where rules or alerts are being used directly or indirectly. Scenarios to consider: > - The correct rules, alerts, and aggregations on top of them are being shown as expected as a superuser. > - The correct rules, alerts, and aggregations on top of them are being shown as expected by a user with limited access to certain features. > - The changes in this PR are backward compatible with the previous users' permissions. ### Solutions Please test and verify that: - All the rule types you own with all possible combinations of permissions both in ESS and in Serverless. - The consumers and rule types make sense when registering the features. - The consumers and rule types that are passed to the components are the intended ones. ### ResponseOps The most important changes are in the alerting authorization class, the search strategy, and the routes. Please test: - The rules we own with all possible combinations of permissions. - The stack alerts page and its solution filtering. - The categories filtering in the maintenance window UI. ## Risks > [!WARNING] > The risks involved in this PR are related to privileges. Specifically: > - Users with no privileges can access rules and alerts they do not have access to. > - Users with privileges cannot access rules and alerts they have access to. > > An excessive list of integration tests is in place to ensure that the above scenarios will not occur. In the case of a bug, we could a) release an energy release for serverless and b) backport the fix in ESS. Given that this PR is intended to be merged in 8.17 we have plenty of time to test and to minimize the chances of risks. ## FQA - I noticed that a lot of routes support the `filter` parameter where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated parameters? The `filter` parameter should not be exposed in the first place. It assumes that the consumer of the API knows the underlying structure and implementation details of the persisted storage API (SavedObject client API). For example, a valid filter would be `alerting.attributes.rule_type_id`. In this filter the consumer should know a) the name of the SO b) the keyword `attributes` (storage implementation detail) and c) the name of the attribute as it is persisted in ES (snake case instead of camel case as it is returned by the APIs). As there is no abstraction layer between the SO and the API, it makes it very difficult to make changes in the persistent schema or the APIs. For all the above I decided to introduce new query parameters where the alerting framework has total control over it. - I noticed in the code a lot of instances where the consumer is used. Should not remove any logic around consumers? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I noticed a lot of hacks like checking if the rule type is `siem`. Should not remove the hacks? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I hate the "Role visibility" dropdown. Can we remove it? I also do not like it. The goal is to remove it. Follow elastic#189997. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Aleh Zasypkin <[email protected]> Co-authored-by: Paula Borgonovi <[email protected]>
## Summary ### Problem With the merge of the PR #183756, the alerts function has stopped working in the Obs AI Assistant, because there has been a change to the query (when finding alerts) ### Solution Revert the change made to the query. ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
## Summary ### Problem With the merge of the PR elastic#183756, the alerts function has stopped working in the Obs AI Assistant, because there has been a change to the query (when finding alerts) ### Solution Revert the change made to the query. ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) (cherry picked from commit d9c1cd3)
…183756) ## Summary This PR aims to decouple the feature IDs from the `consumer` attribute of rules and alerts. Towards: elastic#187202 Fixes: elastic#181559 Fixes: elastic#182435 > [!NOTE] > Unfortunately, I could not break the PR into smaller pieces. The APIs could not work anymore with feature IDs and had to convert them to use rule type IDs. Also, I took the chance and refactored crucial parts of the authorization class that in turn affected a lot of files. Most of the changes in the files are minimal and easy to review. The crucial changes are in the authorization class and some alerting APIs. ## Architecture ### Alerting RBAC model The Kibana security uses Elasticsearch's [application privileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges). This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the `rule_type_id` and `consumer` attributes to define who owns the rule and the alerts procured by the rule. To connect the `rule_type_id` and `consumer` with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed as `alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`. Because to authorizate a resource an action has to be generated and because the action needs a valid feature ID the value of the `consumer` should be a valid feature ID. For example, the `alerting:siem.esqlRule/siem/rule/get` action, means that a user with a role that grants this action can get a rule of type `siem.esqlRule` with consumer `siem`. ### Problem statement At the moment the `consumer` attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically: - Rule types cannot support more than one consumer. - To associate old rules with a new feature ID required a migration on the rule's SOs and the alerts documents. - The API calls are feature ID-oriented and not rule-type-oriented. - The framework has to be aware of the values of the `consumer` attribute. - Feature IDs are tightly coupled with the alerting indices leading to [bugs](elastic#179082). - Legacy consumers that are not a valid feature anymore can cause [bugs](elastic#184595). - The framework has to be aware of legacy consumers to handle edge cases. - The framework has to be aware of specific consumers to handle edge cases. ### Proposed solution This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as: ``` // Old formatting id: 'siem', <--- feature ID alerting:['siem.queryRule'] // New formatting id: 'siem', <--- feature ID alerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting ``` The new actions are constructed as `alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For example `alerting:rule-type-id/my-consumer/rule/get`. The new action means that a user with a role that grants this action can get a rule of type `rule-type` with consumer `my-consumer`. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example: **Old formatting** Schema: ``` id: 'logs', <--- feature ID alerting:['.es-query'] <-- rule type ID ``` Generated action: ``` alerting:.es-query/logs/rule/get ``` **New formatting** Schema: ``` id: 'siem', <--- feature ID alerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting ``` Generated action: ``` alerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before ``` In both formating the actions are the same thus breaking changes are avoided. ### Alerting authorization class The alerting plugin uses and exports the alerting authorization class (`AlertingAuthorization`). The class is responsible for handling all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, and cleaned up, all method arguments converted to be an object, and the response signature of some methods changed. These changes affected various pieces of the code. The changes in this class are the most important in this PR especially the `_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the cornerstone of the alerting RBAC. Please review carefully. ### Instantiation of the alerting authorization class The `AlertingAuthorizationClientFactory` is used to create instances of the `AlertingAuthorization` class. The `AlertingAuthorization` class needs to perform async operations upon instantiation. Because JS, at the moment, does not support async instantiation of classes the `AlertingAuthorization` class was assigning `Promise` objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve readability and make the lifecycle of the class clearer, I separated the construction of the class (initialization) from the bootstrap process. As a result, getting the `AlertingAuthorization` class or any client that depends on it (`getRulesClient` for example) is an async operation. ### Filtering A lot of routes use the authorization class to get the authorization filter (`getFindAuthorizationFilter`), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was built in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with router filtering. I believe these two operations should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, router in our case, to apply extra filters on top of the authorization filter. For that reason, I made all the necessary changes to decouple them. ### Legacy consumers & producer A lot of rules and alerts have been created and are still being created from observability with the `alerts` consumer. When the Alerting RBAC encounters a rule or alert with `alerts` as a consumer it falls back to the `producer` of the rule type ID to construct the actions. For example if a rule with `ruleTypeId: .es-query` and `consumer: alerts` the alerting action will be constructed as `alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the producer of the `.es-query` rule type. The `producer` is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for the `alerts` consumer. To avoid breaking changes all feature privileges that specify access to rule types add the `alerts` consumer when configuring their alerting privileges. By moving the `alerts` consumer to the registration of the feature we can stop relying on the `producer`. The `producer` is not used anymore in the authorization class. In the next PRs the `producer` will removed entirely. ### Routes The following changes were introduced to the alerting routes: - All related routes changed to be rule-type oriented and not feature ID oriented. - All related routes support the `ruleTypeIds` and the `consumers` parameters for filtering. In all routes, the filters are constructed as `ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`. Filtering by consumers is important. In o11y for example, we do not want to show ES rule types with the `stackAlerts` consumer even if the user has access to them. - The `/internal/rac/alerts/_feature_ids` route got deleted as it was not used anywhere in the codebase and it was internal. All the changes in the routes are related to internal routes and no breaking changes are introduced. ### Constants I moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and exported all security solution rule type IDs from `kbn-securitysolution-rules`. I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords like `observablility` (category or subcategory) or even `apm.*` and the framework should know which rule type IDs to pick up. I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward. I will try to clean up as much as possible in further iterations. If you are interested in the upcoming work follow this issue elastic#187202. ### Other notable code changes - Change all instances of feature IDs to rule type IDs. - `isSiemRuleType`: This is a temporary helper function that is needed in places where we handle edge cases related to security solution rule types. Ideally, the framework should be agnostic to the rule types or consumers. The plan is to be removed entirely in further iterations. - Rename alerting `PluginSetupContract` and `PluginStartContract` to `AlertingServerSetup` and `AlertingServerStart`. This made me touch a lot of files but I could not resist. - `filter_consumers` was mistakenly exposed to a public API. It was undocumented. - Files or functions that were not used anywhere in the codebase got deleted. - Change the returned type of the `list` method of the `RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string, RegistryRuleType>`. - Assertion of `KueryNode` in tests changed to an assertion of KQL using `toKqlExpression`. - Removal of `useRuleAADFields` as it is not used anywhere. ## Testing > [!CAUTION] > It is very important to test all the areas of the application where rules or alerts are being used directly or indirectly. Scenarios to consider: > - The correct rules, alerts, and aggregations on top of them are being shown as expected as a superuser. > - The correct rules, alerts, and aggregations on top of them are being shown as expected by a user with limited access to certain features. > - The changes in this PR are backward compatible with the previous users' permissions. ### Solutions Please test and verify that: - All the rule types you own with all possible combinations of permissions both in ESS and in Serverless. - The consumers and rule types make sense when registering the features. - The consumers and rule types that are passed to the components are the intended ones. ### ResponseOps The most important changes are in the alerting authorization class, the search strategy, and the routes. Please test: - The rules we own with all possible combinations of permissions. - The stack alerts page and its solution filtering. - The categories filtering in the maintenance window UI. ## Risks > [!WARNING] > The risks involved in this PR are related to privileges. Specifically: > - Users with no privileges can access rules and alerts they do not have access to. > - Users with privileges cannot access rules and alerts they have access to. > > An excessive list of integration tests is in place to ensure that the above scenarios will not occur. In the case of a bug, we could a) release an energy release for serverless and b) backport the fix in ESS. Given that this PR is intended to be merged in 8.17 we have plenty of time to test and to minimize the chances of risks. ## FQA - I noticed that a lot of routes support the `filter` parameter where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated parameters? The `filter` parameter should not be exposed in the first place. It assumes that the consumer of the API knows the underlying structure and implementation details of the persisted storage API (SavedObject client API). For example, a valid filter would be `alerting.attributes.rule_type_id`. In this filter the consumer should know a) the name of the SO b) the keyword `attributes` (storage implementation detail) and c) the name of the attribute as it is persisted in ES (snake case instead of camel case as it is returned by the APIs). As there is no abstraction layer between the SO and the API, it makes it very difficult to make changes in the persistent schema or the APIs. For all the above I decided to introduce new query parameters where the alerting framework has total control over it. - I noticed in the code a lot of instances where the consumer is used. Should not remove any logic around consumers? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I noticed a lot of hacks like checking if the rule type is `siem`. Should not remove the hacks? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I hate the "Role visibility" dropdown. Can we remove it? I also do not like it. The goal is to remove it. Follow elastic#189997. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Aleh Zasypkin <[email protected]> Co-authored-by: Paula Borgonovi <[email protected]>
## Summary ### Problem With the merge of the PR elastic#183756, the alerts function has stopped working in the Obs AI Assistant, because there has been a change to the query (when finding alerts) ### Solution Revert the change made to the query. ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Summary
This PR aims to decouple the feature IDs from the
consumer
attribute of rules and alerts.Towards: #187202
Fixes: #181559
Fixes: #182435
Note
Unfortunately, I could not break the PR into smaller pieces. The APIs could not work anymore with feature IDs and had to convert them to use rule type IDs. Also, I took the chance and refactored crucial parts of the authorization class that in turn affected a lot of files. Most of the changes in the files are minimal and easy to review. The crucial changes are in the authorization class and some alerting APIs.
Architecture
Alerting RBAC model
The Kibana security uses Elasticsearch's application privileges. This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the
rule_type_id
andconsumer
attributes to define who owns the rule and the alerts procured by the rule. To connect therule_type_id
andconsumer
with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed asalerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>
. Because to authorizate a resource an action has to be generated and because the action needs a valid feature ID the value of theconsumer
should be a valid feature ID. For example, thealerting:siem.esqlRule/siem/rule/get
action, means that a user with a role that grants this action can get a rule of typesiem.esqlRule
with consumersiem
.Problem statement
At the moment the
consumer
attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically:consumer
attribute.Proposed solution
This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as:
The new actions are constructed as
alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>
. For examplealerting:rule-type-id/my-consumer/rule/get
. The new action means that a user with a role that grants this action can get a rule of typerule-type
with consumermy-consumer
. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example:Old formatting
Schema:
Generated action:
New formatting
Schema:
Generated action:
In both formating the actions are the same thus breaking changes are avoided.
Alerting authorization class
The alerting plugin uses and exports the alerting authorization class (
AlertingAuthorization
). The class is responsible for handling all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, and cleaned up, all method arguments converted to be an object, and the response signature of some methods changed. These changes affected various pieces of the code. The changes in this class are the most important in this PR especially the_getAuthorizedRuleTypesWithAuthorizedConsumers
method which is the cornerstone of the alerting RBAC. Please review carefully.Instantiation of the alerting authorization class
The
AlertingAuthorizationClientFactory
is used to create instances of theAlertingAuthorization
class. TheAlertingAuthorization
class needs to perform async operations upon instantiation. Because JS, at the moment, does not support async instantiation of classes theAlertingAuthorization
class was assigningPromise
objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve readability and make the lifecycle of the class clearer, I separated the construction of the class (initialization) from the bootstrap process. As a result, getting theAlertingAuthorization
class or any client that depends on it (getRulesClient
for example) is an async operation.Filtering
A lot of routes use the authorization class to get the authorization filter (
getFindAuthorizationFilter
), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was built in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with router filtering. I believe these two operations should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, router in our case, to apply extra filters on top of the authorization filter. For that reason, I made all the necessary changes to decouple them.Legacy consumers & producer
A lot of rules and alerts have been created and are still being created from observability with the
alerts
consumer. When the Alerting RBAC encounters a rule or alert withalerts
as a consumer it falls back to theproducer
of the rule type ID to construct the actions. For example if a rule withruleTypeId: .es-query
andconsumer: alerts
the alerting action will be constructed asalerting:.es-query/stackAlerts/rule/get
wherestackRules
is the producer of the.es-query
rule type. Theproducer
is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for thealerts
consumer. To avoid breaking changes all feature privileges that specify access to rule types add thealerts
consumer when configuring their alerting privileges. By moving thealerts
consumer to the registration of the feature we can stop relying on theproducer
. Theproducer
is not used anymore in the authorization class. In the next PRs theproducer
will removed entirely.Routes
The following changes were introduced to the alerting routes:
ruleTypeIds
and theconsumers
parameters for filtering. In all routes, the filters are constructed asruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter
. Filtering by consumers is important. In o11y for example, we do not want to show ES rule types with thestackAlerts
consumer even if the user has access to them./internal/rac/alerts/_feature_ids
route got deleted as it was not used anywhere in the codebase and it was internal.All the changes in the routes are related to internal routes and no breaking changes are introduced.
Constants
I moved the o11y and stack rule type IDs to
kbn-rule-data-utils
and exported all security solution rule type IDs fromkbn-securitysolution-rules
. I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords likeobservablility
(category or subcategory) or evenapm.*
and the framework should know which rule type IDs to pick up. I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward. I will try to clean up as much as possible in further iterations. If you are interested in the upcoming work follow this issue #187202.Other notable code changes
isSiemRuleType
: This is a temporary helper function that is needed in places where we handle edge cases related to security solution rule types. Ideally, the framework should be agnostic to the rule types or consumers. The plan is to be removed entirely in further iterations.PluginSetupContract
andPluginStartContract
toAlertingServerSetup
andAlertingServerStart
. This made me touch a lot of files but I could not resist.filter_consumers
was mistakenly exposed to a public API. It was undocumented.list
method of theRuleTypeRegistry
fromSet<RegistryRuleType>
toMap<string, RegistryRuleType>
.KueryNode
in tests changed to an assertion of KQL usingtoKqlExpression
.useRuleAADFields
as it is not used anywhere.Testing
Caution
It is very important to test all the areas of the application where rules or alerts are being used directly or indirectly. Scenarios to consider:
Solutions
Please test and verify that:
ResponseOps
The most important changes are in the alerting authorization class, the search strategy, and the routes. Please test:
Risks
Warning
The risks involved in this PR are related to privileges. Specifically:
An excessive list of integration tests is in place to ensure that the above scenarios will not occur. In the case of a bug, we could a) release an energy release for serverless and b) backport the fix in ESS. Given that this PR is intended to be merged in 8.17 we have plenty of time to test and to minimize the chances of risks.
FQA
filter
parameter where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated parameters?The
filter
parameter should not be exposed in the first place. It assumes that the consumer of the API knows the underlying structure and implementation details of the persisted storage API (SavedObject client API). For example, a valid filter would bealerting.attributes.rule_type_id
. In this filter the consumer should know a) the name of the SO b) the keywordattributes
(storage implementation detail) and c) the name of the attribute as it is persisted in ES (snake case instead of camel case as it is returned by the APIs). As there is no abstraction layer between the SO and the API, it makes it very difficult to make changes in the persistent schema or the APIs. For all the above I decided to introduce new query parameters where the alerting framework has total control over it.This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :).
siem
. Should not remove the hacks?This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :).
I also do not like it. The goal is to remove it. Follow #189997.