-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to Task Manager for validating and versioning the task state objects #159048
Add support to Task Manager for validating and versioning the task state objects #159048
Conversation
…pport-for-validating-and-versioning-state
} | ||
|
||
const versions = Object.keys(taskTypeDef.stateSchemaByVersion).map((v) => parseInt(v, 10)); | ||
const latest = max(versions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we use latest
in the schema we don't need this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with the latest
approach, I will still have to do some tricks to calculate the latest number to store in the stateVersion
attribute of the SO. So if there's { latest: ..., 1: ... }
I'll have to calculate 2
as the version to store.
…pport-for-validating-and-versioning-state
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Public APIs missing comments
Unknown metric groupsAPI count
ESLint disabled line counts
References to deprecated APIs
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @mikecote |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…ate objects (elastic#159048) Part of elastic#155764. In this PR, I'm modifying task manager to allow task types to report a versioned schema for the `state` object. When defining `stateSchemaByVersion`, the following will happen: - The `state` returned from the task runner will get validated against the latest version and throw an error if ever it is invalid (to capture mismatches at development and testing time) - When task manager reads a task, it will migrate the task state to the latest version (if necessary) and validate against the latest schema, dropping any unknown fields (in the scenario of a downgrade). By default, Task Manager will validate the state on write once a versioned schema is provided, however the following config must be enabled for errors to be thrown on read: `xpack.task_manager.allow_reading_invalid_state: true`. We plan to enable this in serverless by default but be cautious on existing deployments and wait for telemetry to show no issues. I've onboarded the `alerts_invalidate_api_keys` task type which can be used as an example to onboard others. See [this commit](elastic@214bae3). ### How to configure a task type to version and validate The structure is defined as: ``` taskManager.registerTaskDefinitions({ ... stateSchemaByVersion: { 1: { // All existing tasks don't have a version so will get `up` migrated to 1 up: (state: Record<string, unknown>) => ({ runs: state.runs || 0, total_invalidated: state.total_invalidated || 0, }), schema: schema.object({ runs: schema.number(), total_invalidated: schema.number(), }), }, }, ... }); ``` However, look at [this commit](elastic@214bae3) for an example that you can leverage type safety from the schema. ### Follow up issues - Onboard non-alerting task types to have a versioned state schema (elastic#159342) - Onboard alerting task types to have a versioned state schema for the framework fields (elastic#159343) - Onboard alerting task types to have a versioned rule and alert state schema within the task state (elastic#159344) - Telemetry on the validation failures (elastic#159345) - Remove feature flag so `allow_reading_invalid_state` is always `false` (elastic#159346) - Force validation on all tasks using state by removing the exemption code (elastic#159347) - Release tasks when encountering a validation failure after run (elastic#159964) ### To Verify NOTE: I have the following verification scenarios in a jest integration test as well => https://github.com/elastic/kibana/pull/159048/files#diff-5f06228df58fa74d5a0f2722c30f1f4bee2ee9df7a14e0700b9aa9bc3864a858. You will need to log the state when the task runs to observe what the task runner receives in different scenarios. ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..4aa4c2c7805 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -140,6 +140,7 @@ function taskRunner( ) { return ({ taskInstance }: RunContext) => { const state = taskInstance.state as LatestTaskStateSchema; + console.log('*** Running task with the following state:', JSON.stringify(state)); return { async run() { let totalInvalidated = 0; ``` #### Scenario 1: Adding an unknown field to the task saved-object gets dropped 1. Startup a fresh Kibana instance 2. Make the following call to Elasticsearch (I used postman). This call adds an unknown property (`foo`) to the task state and makes the task run right away. ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{\"runs\":1,\"total_invalidated\":0,\"foo\":true}" } } } ``` 3. Observe the task run log message, with state not containing `foo`. #### Scenario 2: Task running returning an unknown property causes the task to fail to update 1. Apply the following changes to the code (and ignore TypeScript issues) ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..b15d4a4f478 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -183,6 +183,7 @@ function taskRunner( const updatedState: LatestTaskStateSchema = { runs: (state.runs || 0) + 1, + foo: true, total_invalidated: totalInvalidated, }; return { ``` 2. Make the task run right away by calling Elasticsearch with the following ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z" } } } ``` 3. Notice the validation errors logged as debug ``` [ERROR][plugins.taskManager] Task alerts_invalidate_api_keys "Alerts-alerts_invalidate_api_keys" failed: Error: [foo]: definition for this key is missing ``` #### Scenario 3: Task state gets migrated 1. Apply the following code change ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..338f21bed5b 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -41,6 +41,18 @@ const stateSchemaByVersion = { total_invalidated: schema.number(), }), }, + 2: { + up: (state: Record<string, unknown>) => ({ + runs: state.runs, + total_invalidated: state.total_invalidated, + foo: true, + }), + schema: schema.object({ + runs: schema.number(), + total_invalidated: schema.number(), + foo: schema.boolean(), + }), + }, }; const latestSchema = stateSchemaByVersion[1].schema; ``` 2. Make the task run right away by calling Elasticsearch with the following ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z" } } } ``` 3. Observe the state now contains `foo` property when the task runs. #### Scenario 4: Reading invalid state causes debug logs 1. Run the following request to Elasticsearch ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{}" } } } ``` 2. Observe the Kibana debug log mentioning the validation failure while letting the task through ``` [DEBUG][plugins.taskManager] [alerts_invalidate_api_keys][Alerts-alerts_invalidate_api_keys] Failed to validate the task's state. Allowing read operation to proceed because allow_reading_invalid_state is true. Error: [runs]: expected value of type [number] but got [undefined] ``` #### Scenario 5: Reading invalid state when setting `allow_reading_invalid_state: false` causes tasks to fail to run 1. Set `xpack.task_manager.allow_reading_invalid_state: false` in your kibana.yml settings 2. Run the following request to Elasticsearch ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{}" } } } ``` 3. Observe the Kibana error log mentioning the validation failure ``` [ERROR][plugins.taskManager] Failed to poll for work: Error: [runs]: expected value of type [number] but got [undefined] ``` NOTE: While corrupting the task directly is rare, we plan to re-queue the tasks that failed to read, leveraging work from elastic#159302 in a future PR (hence why the yml config is enabled by default, allowing invalid reads). --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Ying Mao <[email protected]> (cherry picked from commit 40c2afd)
I closed the backport PR (#160420) as that was a mistake, should only go into 8.10. |
… validation (#161581) Part of #159342. In this PR, I'm preparing the non-alerting (rule types) response ops task types for serverless by defining an explicit task state schema. This schema is used to validate the task's state before saving but also when reading. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see #155764). For more information on how to use `stateSchemaByVersion`, see #159048 and https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md. --------- Co-authored-by: kibanamachine <[email protected]>
…sk state validation (#161584) Part of #159342. In this PR, I'm preparing the `ML:saved-objects-sync-task` for serverless by defining an explicit task state schema. This schema is used to validate the task's state before saving the task but also when reading the task. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see #155764). For more information on how to use `stateSchemaByVersion`, see #159048 and https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md. --------- Co-authored-by: James Gowdy <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
…or task state validation (#161747) Part of #159342. In this PR, I'm preparing the security solution related tasks for serverless by defining an explicit task state schema. This schema is used to validate the task's state before saving the task but also when reading the task. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see #155764). For more information on how to use `stateSchemaByVersion`, see #159048 and https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md. --------- Co-authored-by: Kibana Machine <[email protected]>
… task state validation (#161855) Part of #159342. In this PR, I'm preparing the `dashboard_telemetry` for serverless by defining an explicit task state schema. This schema is used to validate the task's state before saving the task but also when reading the task. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see #155764). For more information on how to use `stateSchemaByVersion`, see #159048 and https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md. --------- Co-authored-by: Kibana Machine <[email protected]>
…sion for task state validation (#161740) Part of #159342. In this PR, I'm preparing the `cloud_security_posture-findings_stats` task type for serverless by defining an explicit task state schema. This schema is used to validate the task's state before saving the task but also when reading the task. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see #155764). For more information on how to use `stateSchemaByVersion`, see #159048 and https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md. --------- Co-authored-by: Kibana Machine <[email protected]>
…or task state validation (elastic#161747) Part of elastic#159342. In this PR, I'm preparing the security solution related tasks for serverless by defining an explicit task state schema. This schema is used to validate the task's state before saving the task but also when reading the task. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see elastic#155764). For more information on how to use `stateSchemaByVersion`, see elastic#159048 and https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md. --------- Co-authored-by: Kibana Machine <[email protected]>
… task state validation (elastic#161855) Part of elastic#159342. In this PR, I'm preparing the `dashboard_telemetry` for serverless by defining an explicit task state schema. This schema is used to validate the task's state before saving the task but also when reading the task. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see elastic#155764). For more information on how to use `stateSchemaByVersion`, see elastic#159048 and https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md. --------- Co-authored-by: Kibana Machine <[email protected]>
…sion for task state validation (elastic#161740) Part of elastic#159342. In this PR, I'm preparing the `cloud_security_posture-findings_stats` task type for serverless by defining an explicit task state schema. This schema is used to validate the task's state before saving the task but also when reading the task. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see elastic#155764). For more information on how to use `stateSchemaByVersion`, see elastic#159048 and https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md. --------- Co-authored-by: Kibana Machine <[email protected]>
Part of #155764.
In this PR, I'm modifying task manager to allow task types to report a versioned schema for the
state
object. When definingstateSchemaByVersion
, the following will happen:state
returned from the task runner will get validated against the latest version and throw an error if ever it is invalid (to capture mismatches at development and testing time)By default, Task Manager will validate the state on write once a versioned schema is provided, however the following config must be enabled for errors to be thrown on read:
xpack.task_manager.allow_reading_invalid_state: true
. We plan to enable this in serverless by default but be cautious on existing deployments and wait for telemetry to show no issues.I've onboarded the
alerts_invalidate_api_keys
task type which can be used as an example to onboard others. See this commit.How to configure a task type to version and validate
The structure is defined as:
However, look at this commit for an example that you can leverage type safety from the schema.
Follow up issues
allow_reading_invalid_state
is alwaysfalse
([Task Manager] Remove task state validation feature flag so we no longer allow reading invalid state #159346)To Verify
NOTE: I have the following verification scenarios in a jest integration test as well => https://github.com/elastic/kibana/pull/159048/files#diff-5f06228df58fa74d5a0f2722c30f1f4bee2ee9df7a14e0700b9aa9bc3864a858.
You will need to log the state when the task runs to observe what the task runner receives in different scenarios.
Scenario 1: Adding an unknown field to the task saved-object gets dropped
foo
) to the task state and makes the task run right away.foo
.Scenario 2: Task running returning an unknown property causes the task to fail to update
Scenario 3: Task state gets migrated
foo
property when the task runs.Scenario 4: Reading invalid state causes debug logs
Scenario 5: Reading invalid state when setting
allow_reading_invalid_state: false
causes tasks to fail to runxpack.task_manager.allow_reading_invalid_state: false
in your kibana.yml settingsNOTE: While corrupting the task directly is rare, we plan to re-queue the tasks that failed to read, leveraging work from #159302 in a future PR (hence why the yml config is enabled by default, allowing invalid reads).