Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[8.x] [ResponseOps][Alerting] Decouple feature IDs from consumers (#1…
…83756) (#202667) # Backport This will backport the following commits from `main` to `8.x`: - [[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)](#183756) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Christos Nasikas","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-12-03T10:21:53Z","message":"[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)\n\n## Summary\r\n\r\nThis PR aims to decouple the feature IDs from the `consumer` attribute\r\nof rules and alerts.\r\n\r\nTowards: https://github.com/elastic/kibana/issues/187202\r\nFixes: https://github.com/elastic/kibana/issues/181559\r\nFixes: https://github.com/elastic/kibana/issues/182435\r\n\r\n> [!NOTE] \r\n> Unfortunately, I could not break the PR into smaller pieces. The APIs\r\ncould not work anymore with feature IDs and had to convert them to use\r\nrule type IDs. Also, I took the chance and refactored crucial parts of\r\nthe authorization class that in turn affected a lot of files. Most of\r\nthe changes in the files are minimal and easy to review. The crucial\r\nchanges are in the authorization class and some alerting APIs.\r\n\r\n## Architecture\r\n\r\n### Alerting RBAC model\r\n\r\nThe Kibana security uses Elasticsearch's [application\r\nprivileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges).\r\nThis way Kibana can represent and store its privilege models within\r\nElasticsearch roles. To do that, Kibana security creates actions that\r\nare granted by a specific privilege. Alerting uses its own RBAC model\r\nand is built on top of the existing Kibana security model. The Alerting\r\nRBAC uses the `rule_type_id` and `consumer` attributes to define who\r\nowns the rule and the alerts procured by the rule. To connect the\r\n`rule_type_id` and `consumer` with the Kibana security actions the\r\nAlerting RBAC registers its custom actions. They are constructed as\r\n`alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`.\r\nBecause to authorizate a resource an action has to be generated and\r\nbecause the action needs a valid feature ID the value of the `consumer`\r\nshould be a valid feature ID. For example, the\r\n`alerting:siem.esqlRule/siem/rule/get` action, means that a user with a\r\nrole that grants this action can get a rule of type `siem.esqlRule` with\r\nconsumer `siem`.\r\n\r\n### Problem statement\r\n\r\nAt the moment the `consumer` attribute should be a valid feature ID.\r\nThough this approach worked well so far it has its limitation.\r\nSpecifically:\r\n\r\n- Rule types cannot support more than one consumer.\r\n- To associate old rules with a new feature ID required a migration on\r\nthe rule's SOs and the alerts documents.\r\n- The API calls are feature ID-oriented and not rule-type-oriented.\r\n- The framework has to be aware of the values of the `consumer`\r\nattribute.\r\n- Feature IDs are tightly coupled with the alerting indices leading to\r\n[bugs](https://github.com/elastic/kibana/issues/179082).\r\n- Legacy consumers that are not a valid feature anymore can cause\r\n[bugs](https://github.com/elastic/kibana/issues/184595).\r\n- The framework has to be aware of legacy consumers to handle edge\r\ncases.\r\n- The framework has to be aware of specific consumers to handle edge\r\ncases.\r\n\r\n### Proposed solution\r\n\r\nThis PR aims to decouple the feature IDs from consumers. It achieves\r\nthat a) by changing the way solutions configure the alerting privileges\r\nwhen registering a feature and b) by changing the alerting actions. The\r\nschema changes as:\r\n\r\n```\r\n// Old formatting\r\nid: 'siem', <--- feature ID\r\nalerting:['siem.queryRule']\r\n\r\n// New formatting\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nThe new actions are constructed as\r\n`alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For\r\nexample `alerting:rule-type-id/my-consumer/rule/get`. The new action\r\nmeans that a user with a role that grants this action can get a rule of\r\ntype `rule-type` with consumer `my-consumer`. Changing the action\r\nstrings is not considered a breaking change as long as the user's\r\npermission works as before. In our case, this is true because the\r\nconsumer will be the same as before (feature ID), and the alerting\r\nsecurity actions will be the same. For example:\r\n\r\n**Old formatting**\r\n\r\nSchema:\r\n```\r\nid: 'logs', <--- feature ID\r\nalerting:['.es-query'] <-- rule type ID\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get\r\n```\r\n\r\n**New formatting**\r\n\r\nSchema:\r\n```\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before\r\n```\r\n\r\nIn both formating the actions are the same thus breaking changes are\r\navoided.\r\n\r\n### Alerting authorization class\r\nThe alerting plugin uses and exports the alerting authorization class\r\n(`AlertingAuthorization`). The class is responsible for handling all\r\nauthorization actions related to rules and alerts. The class changed to\r\nhandle the new actions as described in the above sections. A lot of\r\nmethods were renamed, removed, and cleaned up, all method arguments\r\nconverted to be an object, and the response signature of some methods\r\nchanged. These changes affected various pieces of the code. The changes\r\nin this class are the most important in this PR especially the\r\n`_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the\r\ncornerstone of the alerting RBAC. Please review carefully.\r\n\r\n### Instantiation of the alerting authorization class\r\nThe `AlertingAuthorizationClientFactory` is used to create instances of\r\nthe `AlertingAuthorization` class. The `AlertingAuthorization` class\r\nneeds to perform async operations upon instantiation. Because JS, at the\r\nmoment, does not support async instantiation of classes the\r\n`AlertingAuthorization` class was assigning `Promise` objects to\r\nvariables that could be resolved later in other phases of the lifecycle\r\nof the class. To improve readability and make the lifecycle of the class\r\nclearer, I separated the construction of the class (initialization) from\r\nthe bootstrap process. As a result, getting the `AlertingAuthorization`\r\nclass or any client that depends on it (`getRulesClient` for example) is\r\nan async operation.\r\n\r\n### Filtering\r\nA lot of routes use the authorization class to get the authorization\r\nfilter (`getFindAuthorizationFilter`), a filter that, if applied,\r\nreturns only the rule types and consumers the user is authorized to. The\r\nmethod that returns the filter was built in a way to also support\r\nfiltering on top of the authorization filter thus coupling the\r\nauthorized filter with router filtering. I believe these two operations\r\nshould be decoupled and the filter method should return a filter that\r\ngives you all the authorized rule types. It is the responsibility of the\r\nconsumer, router in our case, to apply extra filters on top of the\r\nauthorization filter. For that reason, I made all the necessary changes\r\nto decouple them.\r\n\r\n### Legacy consumers & producer\r\nA lot of rules and alerts have been created and are still being created\r\nfrom observability with the `alerts` consumer. When the Alerting RBAC\r\nencounters a rule or alert with `alerts` as a consumer it falls back to\r\nthe `producer` of the rule type ID to construct the actions. For example\r\nif a rule with `ruleTypeId: .es-query` and `consumer: alerts` the\r\nalerting action will be constructed as\r\n`alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the\r\nproducer of the `.es-query` rule type. The `producer` is used to be used\r\nin alerting authorization but due to its complexity, it was deprecated\r\nand only used as a fallback for the `alerts` consumer. To avoid breaking\r\nchanges all feature privileges that specify access to rule types add the\r\n`alerts` consumer when configuring their alerting privileges. By moving\r\nthe `alerts` consumer to the registration of the feature we can stop\r\nrelying on the `producer`. The `producer` is not used anymore in the\r\nauthorization class. In the next PRs the `producer` will removed\r\nentirely.\r\n\r\n### Routes\r\nThe following changes were introduced to the alerting routes:\r\n\r\n- All related routes changed to be rule-type oriented and not feature ID\r\noriented.\r\n- All related routes support the `ruleTypeIds` and the `consumers`\r\nparameters for filtering. In all routes, the filters are constructed as\r\n`ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`.\r\nFiltering by consumers is important. In o11y for example, we do not want\r\nto show ES rule types with the `stackAlerts` consumer even if the user\r\nhas access to them.\r\n- The `/internal/rac/alerts/_feature_ids` route got deleted as it was\r\nnot used anywhere in the codebase and it was internal.\r\n\r\nAll the changes in the routes are related to internal routes and no\r\nbreaking changes are introduced.\r\n\r\n### Constants\r\nI moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and\r\nexported all security solution rule type IDs from\r\n`kbn-securitysolution-rules`. I am not a fan of having a centralized\r\nplace for the rule type IDs. Ideally, consumers of the framework should\r\nspecify keywords like `observablility` (category or subcategory) or even\r\n`apm.*` and the framework should know which rule type IDs to pick up. I\r\nthink it is out of the scope of the PR, and at the moment it seems the\r\nmost straightforward way to move forward. I will try to clean up as much\r\nas possible in further iterations. If you are interested in the upcoming\r\nwork follow this issue https://github.com/elastic/kibana/issues/187202.\r\n\r\n### Other notable code changes\r\n- Change all instances of feature IDs to rule type IDs.\r\n- `isSiemRuleType`: This is a temporary helper function that is needed\r\nin places where we handle edge cases related to security solution rule\r\ntypes. Ideally, the framework should be agnostic to the rule types or\r\nconsumers. The plan is to be removed entirely in further iterations.\r\n- Rename alerting `PluginSetupContract` and `PluginStartContract` to\r\n`AlertingServerSetup` and `AlertingServerStart`. This made me touch a\r\nlot of files but I could not resist.\r\n- `filter_consumers` was mistakenly exposed to a public API. It was\r\nundocumented.\r\n- Files or functions that were not used anywhere in the codebase got\r\ndeleted.\r\n- Change the returned type of the `list` method of the\r\n`RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string,\r\nRegistryRuleType>`.\r\n- Assertion of `KueryNode` in tests changed to an assertion of KQL using\r\n`toKqlExpression`.\r\n- Removal of `useRuleAADFields` as it is not used anywhere.\r\n\r\n## Testing\r\n\r\n> [!CAUTION]\r\n> It is very important to test all the areas of the application where\r\nrules or alerts are being used directly or indirectly. Scenarios to\r\nconsider:\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected as a superuser.\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected by a user with limited access to certain features.\r\n> - The changes in this PR are backward compatible with the previous\r\nusers' permissions.\r\n\r\n### Solutions\r\nPlease test and verify that:\r\n- All the rule types you own with all possible combinations of\r\npermissions both in ESS and in Serverless.\r\n- The consumers and rule types make sense when registering the features.\r\n- The consumers and rule types that are passed to the components are the\r\nintended ones.\r\n\r\n### ResponseOps\r\nThe most important changes are in the alerting authorization class, the\r\nsearch strategy, and the routes. Please test:\r\n- The rules we own with all possible combinations of permissions.\r\n- The stack alerts page and its solution filtering.\r\n- The categories filtering in the maintenance window UI.\r\n\r\n## Risks\r\n> [!WARNING]\r\n> The risks involved in this PR are related to privileges. Specifically:\r\n> - Users with no privileges can access rules and alerts they do not\r\nhave access to.\r\n> - Users with privileges cannot access rules and alerts they have\r\naccess to.\r\n>\r\n> An excessive list of integration tests is in place to ensure that the\r\nabove scenarios will not occur. In the case of a bug, we could a)\r\nrelease an energy release for serverless and b) backport the fix in ESS.\r\nGiven that this PR is intended to be merged in 8.17 we have plenty of\r\ntime to test and to minimize the chances of risks.\r\n\r\n## FQA\r\n\r\n- I noticed that a lot of routes support the `filter` parameter where we\r\ncan pass an arbitrary KQL filter. Why we do not use this to filter by\r\nthe rule type IDs and the consumers and instead we introduce new\r\ndedicated parameters?\r\n\r\nThe `filter` parameter should not be exposed in the first place. It\r\nassumes that the consumer of the API knows the underlying structure and\r\nimplementation details of the persisted storage API (SavedObject client\r\nAPI). For example, a valid filter would be\r\n`alerting.attributes.rule_type_id`. In this filter the consumer should\r\nknow a) the name of the SO b) the keyword `attributes` (storage\r\nimplementation detail) and c) the name of the attribute as it is\r\npersisted in ES (snake case instead of camel case as it is returned by\r\nthe APIs). As there is no abstraction layer between the SO and the API,\r\nit makes it very difficult to make changes in the persistent schema or\r\nthe APIs. For all the above I decided to introduce new query parameters\r\nwhere the alerting framework has total control over it.\r\n\r\n- I noticed in the code a lot of instances where the consumer is used.\r\nShould not remove any logic around consumers?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I noticed a lot of hacks like checking if the rule type is `siem`.\r\nShould not remove the hacks?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I hate the \"Role visibility\" dropdown. Can we remove it?\r\n\r\nI also do not like it. The goal is to remove it. Follow\r\nhttps://github.com//issues/189997.\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <[email protected]>\r\nCo-authored-by: Aleh Zasypkin <[email protected]>\r\nCo-authored-by: Paula Borgonovi <[email protected]>","sha":"a3496c9ca6d99fa301fd8ca73ad44178cdeb2955","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Feature:Alerting","release_note:skip","Team:ResponseOps","v9.0.0","backport:prev-minor","ci:cloud-deploy","ci:cloud-persist-deployment","ci:build-serverless-image","Team:Obs AI Assistant","ci:project-deploy-observability","Team:obs-ux-infra_services","Team:obs-ux-management","apm:review","v8.18.0"],"number":183756,"url":"https://github.com/elastic/kibana/pull/183756","mergeCommit":{"message":"[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)\n\n## Summary\r\n\r\nThis PR aims to decouple the feature IDs from the `consumer` attribute\r\nof rules and alerts.\r\n\r\nTowards: https://github.com/elastic/kibana/issues/187202\r\nFixes: https://github.com/elastic/kibana/issues/181559\r\nFixes: https://github.com/elastic/kibana/issues/182435\r\n\r\n> [!NOTE] \r\n> Unfortunately, I could not break the PR into smaller pieces. The APIs\r\ncould not work anymore with feature IDs and had to convert them to use\r\nrule type IDs. Also, I took the chance and refactored crucial parts of\r\nthe authorization class that in turn affected a lot of files. Most of\r\nthe changes in the files are minimal and easy to review. The crucial\r\nchanges are in the authorization class and some alerting APIs.\r\n\r\n## Architecture\r\n\r\n### Alerting RBAC model\r\n\r\nThe Kibana security uses Elasticsearch's [application\r\nprivileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges).\r\nThis way Kibana can represent and store its privilege models within\r\nElasticsearch roles. To do that, Kibana security creates actions that\r\nare granted by a specific privilege. Alerting uses its own RBAC model\r\nand is built on top of the existing Kibana security model. The Alerting\r\nRBAC uses the `rule_type_id` and `consumer` attributes to define who\r\nowns the rule and the alerts procured by the rule. To connect the\r\n`rule_type_id` and `consumer` with the Kibana security actions the\r\nAlerting RBAC registers its custom actions. They are constructed as\r\n`alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`.\r\nBecause to authorizate a resource an action has to be generated and\r\nbecause the action needs a valid feature ID the value of the `consumer`\r\nshould be a valid feature ID. For example, the\r\n`alerting:siem.esqlRule/siem/rule/get` action, means that a user with a\r\nrole that grants this action can get a rule of type `siem.esqlRule` with\r\nconsumer `siem`.\r\n\r\n### Problem statement\r\n\r\nAt the moment the `consumer` attribute should be a valid feature ID.\r\nThough this approach worked well so far it has its limitation.\r\nSpecifically:\r\n\r\n- Rule types cannot support more than one consumer.\r\n- To associate old rules with a new feature ID required a migration on\r\nthe rule's SOs and the alerts documents.\r\n- The API calls are feature ID-oriented and not rule-type-oriented.\r\n- The framework has to be aware of the values of the `consumer`\r\nattribute.\r\n- Feature IDs are tightly coupled with the alerting indices leading to\r\n[bugs](https://github.com/elastic/kibana/issues/179082).\r\n- Legacy consumers that are not a valid feature anymore can cause\r\n[bugs](https://github.com/elastic/kibana/issues/184595).\r\n- The framework has to be aware of legacy consumers to handle edge\r\ncases.\r\n- The framework has to be aware of specific consumers to handle edge\r\ncases.\r\n\r\n### Proposed solution\r\n\r\nThis PR aims to decouple the feature IDs from consumers. It achieves\r\nthat a) by changing the way solutions configure the alerting privileges\r\nwhen registering a feature and b) by changing the alerting actions. The\r\nschema changes as:\r\n\r\n```\r\n// Old formatting\r\nid: 'siem', <--- feature ID\r\nalerting:['siem.queryRule']\r\n\r\n// New formatting\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nThe new actions are constructed as\r\n`alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For\r\nexample `alerting:rule-type-id/my-consumer/rule/get`. The new action\r\nmeans that a user with a role that grants this action can get a rule of\r\ntype `rule-type` with consumer `my-consumer`. Changing the action\r\nstrings is not considered a breaking change as long as the user's\r\npermission works as before. In our case, this is true because the\r\nconsumer will be the same as before (feature ID), and the alerting\r\nsecurity actions will be the same. For example:\r\n\r\n**Old formatting**\r\n\r\nSchema:\r\n```\r\nid: 'logs', <--- feature ID\r\nalerting:['.es-query'] <-- rule type ID\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get\r\n```\r\n\r\n**New formatting**\r\n\r\nSchema:\r\n```\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before\r\n```\r\n\r\nIn both formating the actions are the same thus breaking changes are\r\navoided.\r\n\r\n### Alerting authorization class\r\nThe alerting plugin uses and exports the alerting authorization class\r\n(`AlertingAuthorization`). The class is responsible for handling all\r\nauthorization actions related to rules and alerts. The class changed to\r\nhandle the new actions as described in the above sections. A lot of\r\nmethods were renamed, removed, and cleaned up, all method arguments\r\nconverted to be an object, and the response signature of some methods\r\nchanged. These changes affected various pieces of the code. The changes\r\nin this class are the most important in this PR especially the\r\n`_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the\r\ncornerstone of the alerting RBAC. Please review carefully.\r\n\r\n### Instantiation of the alerting authorization class\r\nThe `AlertingAuthorizationClientFactory` is used to create instances of\r\nthe `AlertingAuthorization` class. The `AlertingAuthorization` class\r\nneeds to perform async operations upon instantiation. Because JS, at the\r\nmoment, does not support async instantiation of classes the\r\n`AlertingAuthorization` class was assigning `Promise` objects to\r\nvariables that could be resolved later in other phases of the lifecycle\r\nof the class. To improve readability and make the lifecycle of the class\r\nclearer, I separated the construction of the class (initialization) from\r\nthe bootstrap process. As a result, getting the `AlertingAuthorization`\r\nclass or any client that depends on it (`getRulesClient` for example) is\r\nan async operation.\r\n\r\n### Filtering\r\nA lot of routes use the authorization class to get the authorization\r\nfilter (`getFindAuthorizationFilter`), a filter that, if applied,\r\nreturns only the rule types and consumers the user is authorized to. The\r\nmethod that returns the filter was built in a way to also support\r\nfiltering on top of the authorization filter thus coupling the\r\nauthorized filter with router filtering. I believe these two operations\r\nshould be decoupled and the filter method should return a filter that\r\ngives you all the authorized rule types. It is the responsibility of the\r\nconsumer, router in our case, to apply extra filters on top of the\r\nauthorization filter. For that reason, I made all the necessary changes\r\nto decouple them.\r\n\r\n### Legacy consumers & producer\r\nA lot of rules and alerts have been created and are still being created\r\nfrom observability with the `alerts` consumer. When the Alerting RBAC\r\nencounters a rule or alert with `alerts` as a consumer it falls back to\r\nthe `producer` of the rule type ID to construct the actions. For example\r\nif a rule with `ruleTypeId: .es-query` and `consumer: alerts` the\r\nalerting action will be constructed as\r\n`alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the\r\nproducer of the `.es-query` rule type. The `producer` is used to be used\r\nin alerting authorization but due to its complexity, it was deprecated\r\nand only used as a fallback for the `alerts` consumer. To avoid breaking\r\nchanges all feature privileges that specify access to rule types add the\r\n`alerts` consumer when configuring their alerting privileges. By moving\r\nthe `alerts` consumer to the registration of the feature we can stop\r\nrelying on the `producer`. The `producer` is not used anymore in the\r\nauthorization class. In the next PRs the `producer` will removed\r\nentirely.\r\n\r\n### Routes\r\nThe following changes were introduced to the alerting routes:\r\n\r\n- All related routes changed to be rule-type oriented and not feature ID\r\noriented.\r\n- All related routes support the `ruleTypeIds` and the `consumers`\r\nparameters for filtering. In all routes, the filters are constructed as\r\n`ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`.\r\nFiltering by consumers is important. In o11y for example, we do not want\r\nto show ES rule types with the `stackAlerts` consumer even if the user\r\nhas access to them.\r\n- The `/internal/rac/alerts/_feature_ids` route got deleted as it was\r\nnot used anywhere in the codebase and it was internal.\r\n\r\nAll the changes in the routes are related to internal routes and no\r\nbreaking changes are introduced.\r\n\r\n### Constants\r\nI moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and\r\nexported all security solution rule type IDs from\r\n`kbn-securitysolution-rules`. I am not a fan of having a centralized\r\nplace for the rule type IDs. Ideally, consumers of the framework should\r\nspecify keywords like `observablility` (category or subcategory) or even\r\n`apm.*` and the framework should know which rule type IDs to pick up. I\r\nthink it is out of the scope of the PR, and at the moment it seems the\r\nmost straightforward way to move forward. I will try to clean up as much\r\nas possible in further iterations. If you are interested in the upcoming\r\nwork follow this issue https://github.com/elastic/kibana/issues/187202.\r\n\r\n### Other notable code changes\r\n- Change all instances of feature IDs to rule type IDs.\r\n- `isSiemRuleType`: This is a temporary helper function that is needed\r\nin places where we handle edge cases related to security solution rule\r\ntypes. Ideally, the framework should be agnostic to the rule types or\r\nconsumers. The plan is to be removed entirely in further iterations.\r\n- Rename alerting `PluginSetupContract` and `PluginStartContract` to\r\n`AlertingServerSetup` and `AlertingServerStart`. This made me touch a\r\nlot of files but I could not resist.\r\n- `filter_consumers` was mistakenly exposed to a public API. It was\r\nundocumented.\r\n- Files or functions that were not used anywhere in the codebase got\r\ndeleted.\r\n- Change the returned type of the `list` method of the\r\n`RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string,\r\nRegistryRuleType>`.\r\n- Assertion of `KueryNode` in tests changed to an assertion of KQL using\r\n`toKqlExpression`.\r\n- Removal of `useRuleAADFields` as it is not used anywhere.\r\n\r\n## Testing\r\n\r\n> [!CAUTION]\r\n> It is very important to test all the areas of the application where\r\nrules or alerts are being used directly or indirectly. Scenarios to\r\nconsider:\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected as a superuser.\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected by a user with limited access to certain features.\r\n> - The changes in this PR are backward compatible with the previous\r\nusers' permissions.\r\n\r\n### Solutions\r\nPlease test and verify that:\r\n- All the rule types you own with all possible combinations of\r\npermissions both in ESS and in Serverless.\r\n- The consumers and rule types make sense when registering the features.\r\n- The consumers and rule types that are passed to the components are the\r\nintended ones.\r\n\r\n### ResponseOps\r\nThe most important changes are in the alerting authorization class, the\r\nsearch strategy, and the routes. Please test:\r\n- The rules we own with all possible combinations of permissions.\r\n- The stack alerts page and its solution filtering.\r\n- The categories filtering in the maintenance window UI.\r\n\r\n## Risks\r\n> [!WARNING]\r\n> The risks involved in this PR are related to privileges. Specifically:\r\n> - Users with no privileges can access rules and alerts they do not\r\nhave access to.\r\n> - Users with privileges cannot access rules and alerts they have\r\naccess to.\r\n>\r\n> An excessive list of integration tests is in place to ensure that the\r\nabove scenarios will not occur. In the case of a bug, we could a)\r\nrelease an energy release for serverless and b) backport the fix in ESS.\r\nGiven that this PR is intended to be merged in 8.17 we have plenty of\r\ntime to test and to minimize the chances of risks.\r\n\r\n## FQA\r\n\r\n- I noticed that a lot of routes support the `filter` parameter where we\r\ncan pass an arbitrary KQL filter. Why we do not use this to filter by\r\nthe rule type IDs and the consumers and instead we introduce new\r\ndedicated parameters?\r\n\r\nThe `filter` parameter should not be exposed in the first place. It\r\nassumes that the consumer of the API knows the underlying structure and\r\nimplementation details of the persisted storage API (SavedObject client\r\nAPI). For example, a valid filter would be\r\n`alerting.attributes.rule_type_id`. In this filter the consumer should\r\nknow a) the name of the SO b) the keyword `attributes` (storage\r\nimplementation detail) and c) the name of the attribute as it is\r\npersisted in ES (snake case instead of camel case as it is returned by\r\nthe APIs). As there is no abstraction layer between the SO and the API,\r\nit makes it very difficult to make changes in the persistent schema or\r\nthe APIs. For all the above I decided to introduce new query parameters\r\nwhere the alerting framework has total control over it.\r\n\r\n- I noticed in the code a lot of instances where the consumer is used.\r\nShould not remove any logic around consumers?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I noticed a lot of hacks like checking if the rule type is `siem`.\r\nShould not remove the hacks?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I hate the \"Role visibility\" dropdown. Can we remove it?\r\n\r\nI also do not like it. The goal is to remove it. Follow\r\nhttps://github.com//issues/189997.\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <[email protected]>\r\nCo-authored-by: Aleh Zasypkin <[email protected]>\r\nCo-authored-by: Paula Borgonovi <[email protected]>","sha":"a3496c9ca6d99fa301fd8ca73ad44178cdeb2955"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/183756","number":183756,"mergeCommit":{"message":"[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)\n\n## Summary\r\n\r\nThis PR aims to decouple the feature IDs from the `consumer` attribute\r\nof rules and alerts.\r\n\r\nTowards: https://github.com/elastic/kibana/issues/187202\r\nFixes: https://github.com/elastic/kibana/issues/181559\r\nFixes: https://github.com/elastic/kibana/issues/182435\r\n\r\n> [!NOTE] \r\n> Unfortunately, I could not break the PR into smaller pieces. The APIs\r\ncould not work anymore with feature IDs and had to convert them to use\r\nrule type IDs. Also, I took the chance and refactored crucial parts of\r\nthe authorization class that in turn affected a lot of files. Most of\r\nthe changes in the files are minimal and easy to review. The crucial\r\nchanges are in the authorization class and some alerting APIs.\r\n\r\n## Architecture\r\n\r\n### Alerting RBAC model\r\n\r\nThe Kibana security uses Elasticsearch's [application\r\nprivileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges).\r\nThis way Kibana can represent and store its privilege models within\r\nElasticsearch roles. To do that, Kibana security creates actions that\r\nare granted by a specific privilege. Alerting uses its own RBAC model\r\nand is built on top of the existing Kibana security model. The Alerting\r\nRBAC uses the `rule_type_id` and `consumer` attributes to define who\r\nowns the rule and the alerts procured by the rule. To connect the\r\n`rule_type_id` and `consumer` with the Kibana security actions the\r\nAlerting RBAC registers its custom actions. They are constructed as\r\n`alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`.\r\nBecause to authorizate a resource an action has to be generated and\r\nbecause the action needs a valid feature ID the value of the `consumer`\r\nshould be a valid feature ID. For example, the\r\n`alerting:siem.esqlRule/siem/rule/get` action, means that a user with a\r\nrole that grants this action can get a rule of type `siem.esqlRule` with\r\nconsumer `siem`.\r\n\r\n### Problem statement\r\n\r\nAt the moment the `consumer` attribute should be a valid feature ID.\r\nThough this approach worked well so far it has its limitation.\r\nSpecifically:\r\n\r\n- Rule types cannot support more than one consumer.\r\n- To associate old rules with a new feature ID required a migration on\r\nthe rule's SOs and the alerts documents.\r\n- The API calls are feature ID-oriented and not rule-type-oriented.\r\n- The framework has to be aware of the values of the `consumer`\r\nattribute.\r\n- Feature IDs are tightly coupled with the alerting indices leading to\r\n[bugs](https://github.com/elastic/kibana/issues/179082).\r\n- Legacy consumers that are not a valid feature anymore can cause\r\n[bugs](https://github.com/elastic/kibana/issues/184595).\r\n- The framework has to be aware of legacy consumers to handle edge\r\ncases.\r\n- The framework has to be aware of specific consumers to handle edge\r\ncases.\r\n\r\n### Proposed solution\r\n\r\nThis PR aims to decouple the feature IDs from consumers. It achieves\r\nthat a) by changing the way solutions configure the alerting privileges\r\nwhen registering a feature and b) by changing the alerting actions. The\r\nschema changes as:\r\n\r\n```\r\n// Old formatting\r\nid: 'siem', <--- feature ID\r\nalerting:['siem.queryRule']\r\n\r\n// New formatting\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nThe new actions are constructed as\r\n`alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For\r\nexample `alerting:rule-type-id/my-consumer/rule/get`. The new action\r\nmeans that a user with a role that grants this action can get a rule of\r\ntype `rule-type` with consumer `my-consumer`. Changing the action\r\nstrings is not considered a breaking change as long as the user's\r\npermission works as before. In our case, this is true because the\r\nconsumer will be the same as before (feature ID), and the alerting\r\nsecurity actions will be the same. For example:\r\n\r\n**Old formatting**\r\n\r\nSchema:\r\n```\r\nid: 'logs', <--- feature ID\r\nalerting:['.es-query'] <-- rule type ID\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get\r\n```\r\n\r\n**New formatting**\r\n\r\nSchema:\r\n```\r\nid: 'siem', <--- feature ID\r\nalerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting\r\n```\r\n\r\nGenerated action:\r\n\r\n```\r\nalerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before\r\n```\r\n\r\nIn both formating the actions are the same thus breaking changes are\r\navoided.\r\n\r\n### Alerting authorization class\r\nThe alerting plugin uses and exports the alerting authorization class\r\n(`AlertingAuthorization`). The class is responsible for handling all\r\nauthorization actions related to rules and alerts. The class changed to\r\nhandle the new actions as described in the above sections. A lot of\r\nmethods were renamed, removed, and cleaned up, all method arguments\r\nconverted to be an object, and the response signature of some methods\r\nchanged. These changes affected various pieces of the code. The changes\r\nin this class are the most important in this PR especially the\r\n`_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the\r\ncornerstone of the alerting RBAC. Please review carefully.\r\n\r\n### Instantiation of the alerting authorization class\r\nThe `AlertingAuthorizationClientFactory` is used to create instances of\r\nthe `AlertingAuthorization` class. The `AlertingAuthorization` class\r\nneeds to perform async operations upon instantiation. Because JS, at the\r\nmoment, does not support async instantiation of classes the\r\n`AlertingAuthorization` class was assigning `Promise` objects to\r\nvariables that could be resolved later in other phases of the lifecycle\r\nof the class. To improve readability and make the lifecycle of the class\r\nclearer, I separated the construction of the class (initialization) from\r\nthe bootstrap process. As a result, getting the `AlertingAuthorization`\r\nclass or any client that depends on it (`getRulesClient` for example) is\r\nan async operation.\r\n\r\n### Filtering\r\nA lot of routes use the authorization class to get the authorization\r\nfilter (`getFindAuthorizationFilter`), a filter that, if applied,\r\nreturns only the rule types and consumers the user is authorized to. The\r\nmethod that returns the filter was built in a way to also support\r\nfiltering on top of the authorization filter thus coupling the\r\nauthorized filter with router filtering. I believe these two operations\r\nshould be decoupled and the filter method should return a filter that\r\ngives you all the authorized rule types. It is the responsibility of the\r\nconsumer, router in our case, to apply extra filters on top of the\r\nauthorization filter. For that reason, I made all the necessary changes\r\nto decouple them.\r\n\r\n### Legacy consumers & producer\r\nA lot of rules and alerts have been created and are still being created\r\nfrom observability with the `alerts` consumer. When the Alerting RBAC\r\nencounters a rule or alert with `alerts` as a consumer it falls back to\r\nthe `producer` of the rule type ID to construct the actions. For example\r\nif a rule with `ruleTypeId: .es-query` and `consumer: alerts` the\r\nalerting action will be constructed as\r\n`alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the\r\nproducer of the `.es-query` rule type. The `producer` is used to be used\r\nin alerting authorization but due to its complexity, it was deprecated\r\nand only used as a fallback for the `alerts` consumer. To avoid breaking\r\nchanges all feature privileges that specify access to rule types add the\r\n`alerts` consumer when configuring their alerting privileges. By moving\r\nthe `alerts` consumer to the registration of the feature we can stop\r\nrelying on the `producer`. The `producer` is not used anymore in the\r\nauthorization class. In the next PRs the `producer` will removed\r\nentirely.\r\n\r\n### Routes\r\nThe following changes were introduced to the alerting routes:\r\n\r\n- All related routes changed to be rule-type oriented and not feature ID\r\noriented.\r\n- All related routes support the `ruleTypeIds` and the `consumers`\r\nparameters for filtering. In all routes, the filters are constructed as\r\n`ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`.\r\nFiltering by consumers is important. In o11y for example, we do not want\r\nto show ES rule types with the `stackAlerts` consumer even if the user\r\nhas access to them.\r\n- The `/internal/rac/alerts/_feature_ids` route got deleted as it was\r\nnot used anywhere in the codebase and it was internal.\r\n\r\nAll the changes in the routes are related to internal routes and no\r\nbreaking changes are introduced.\r\n\r\n### Constants\r\nI moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and\r\nexported all security solution rule type IDs from\r\n`kbn-securitysolution-rules`. I am not a fan of having a centralized\r\nplace for the rule type IDs. Ideally, consumers of the framework should\r\nspecify keywords like `observablility` (category or subcategory) or even\r\n`apm.*` and the framework should know which rule type IDs to pick up. I\r\nthink it is out of the scope of the PR, and at the moment it seems the\r\nmost straightforward way to move forward. I will try to clean up as much\r\nas possible in further iterations. If you are interested in the upcoming\r\nwork follow this issue https://github.com/elastic/kibana/issues/187202.\r\n\r\n### Other notable code changes\r\n- Change all instances of feature IDs to rule type IDs.\r\n- `isSiemRuleType`: This is a temporary helper function that is needed\r\nin places where we handle edge cases related to security solution rule\r\ntypes. Ideally, the framework should be agnostic to the rule types or\r\nconsumers. The plan is to be removed entirely in further iterations.\r\n- Rename alerting `PluginSetupContract` and `PluginStartContract` to\r\n`AlertingServerSetup` and `AlertingServerStart`. This made me touch a\r\nlot of files but I could not resist.\r\n- `filter_consumers` was mistakenly exposed to a public API. It was\r\nundocumented.\r\n- Files or functions that were not used anywhere in the codebase got\r\ndeleted.\r\n- Change the returned type of the `list` method of the\r\n`RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string,\r\nRegistryRuleType>`.\r\n- Assertion of `KueryNode` in tests changed to an assertion of KQL using\r\n`toKqlExpression`.\r\n- Removal of `useRuleAADFields` as it is not used anywhere.\r\n\r\n## Testing\r\n\r\n> [!CAUTION]\r\n> It is very important to test all the areas of the application where\r\nrules or alerts are being used directly or indirectly. Scenarios to\r\nconsider:\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected as a superuser.\r\n> - The correct rules, alerts, and aggregations on top of them are being\r\nshown as expected by a user with limited access to certain features.\r\n> - The changes in this PR are backward compatible with the previous\r\nusers' permissions.\r\n\r\n### Solutions\r\nPlease test and verify that:\r\n- All the rule types you own with all possible combinations of\r\npermissions both in ESS and in Serverless.\r\n- The consumers and rule types make sense when registering the features.\r\n- The consumers and rule types that are passed to the components are the\r\nintended ones.\r\n\r\n### ResponseOps\r\nThe most important changes are in the alerting authorization class, the\r\nsearch strategy, and the routes. Please test:\r\n- The rules we own with all possible combinations of permissions.\r\n- The stack alerts page and its solution filtering.\r\n- The categories filtering in the maintenance window UI.\r\n\r\n## Risks\r\n> [!WARNING]\r\n> The risks involved in this PR are related to privileges. Specifically:\r\n> - Users with no privileges can access rules and alerts they do not\r\nhave access to.\r\n> - Users with privileges cannot access rules and alerts they have\r\naccess to.\r\n>\r\n> An excessive list of integration tests is in place to ensure that the\r\nabove scenarios will not occur. In the case of a bug, we could a)\r\nrelease an energy release for serverless and b) backport the fix in ESS.\r\nGiven that this PR is intended to be merged in 8.17 we have plenty of\r\ntime to test and to minimize the chances of risks.\r\n\r\n## FQA\r\n\r\n- I noticed that a lot of routes support the `filter` parameter where we\r\ncan pass an arbitrary KQL filter. Why we do not use this to filter by\r\nthe rule type IDs and the consumers and instead we introduce new\r\ndedicated parameters?\r\n\r\nThe `filter` parameter should not be exposed in the first place. It\r\nassumes that the consumer of the API knows the underlying structure and\r\nimplementation details of the persisted storage API (SavedObject client\r\nAPI). For example, a valid filter would be\r\n`alerting.attributes.rule_type_id`. In this filter the consumer should\r\nknow a) the name of the SO b) the keyword `attributes` (storage\r\nimplementation detail) and c) the name of the attribute as it is\r\npersisted in ES (snake case instead of camel case as it is returned by\r\nthe APIs). As there is no abstraction layer between the SO and the API,\r\nit makes it very difficult to make changes in the persistent schema or\r\nthe APIs. For all the above I decided to introduce new query parameters\r\nwhere the alerting framework has total control over it.\r\n\r\n- I noticed in the code a lot of instances where the consumer is used.\r\nShould not remove any logic around consumers?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I noticed a lot of hacks like checking if the rule type is `siem`.\r\nShould not remove the hacks?\r\n\r\nThis PR is a step forward making the framework as agnostic as possible.\r\nI had to keep the scope of the PR as contained as possible. We will get\r\nthere. It needs time :).\r\n\r\n- I hate the \"Role visibility\" dropdown. Can we remove it?\r\n\r\nI also do not like it. The goal is to remove it. Follow\r\nhttps://github.com//issues/189997.\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <[email protected]>\r\nCo-authored-by: Aleh Zasypkin <[email protected]>\r\nCo-authored-by: Paula Borgonovi <[email protected]>","sha":"a3496c9ca6d99fa301fd8ca73ad44178cdeb2955"}},{"branch":"8.x","label":"v8.18.0","labelRegex":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
- Loading branch information