[Task Manager] Force validation on all tasks using state by removing the exemption code #159347

mikecote · 2023-06-08T17:37:52Z

Once we have onboarded all the task types to report their state schemas, we force validation on all tasks that are using state by removing the exemption code in the task_validator.ts file.

    // TODO: Remove once all task types have defined their state schema.
    // Otherwise, failures on read / write would occur. (don't forget to unskip test)
    if (!lastestStateSchema) {
      return task;
    }

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-06-08T17:37:55Z

Pinging @elastic/response-ops (Team:ResponseOps)

…ate objects (#159048) Part of #155764. In this PR, I'm modifying task manager to allow task types to report a versioned schema for the `state` object. When defining `stateSchemaByVersion`, the following will happen: - The `state` returned from the task runner will get validated against the latest version and throw an error if ever it is invalid (to capture mismatches at development and testing time) - When task manager reads a task, it will migrate the task state to the latest version (if necessary) and validate against the latest schema, dropping any unknown fields (in the scenario of a downgrade). By default, Task Manager will validate the state on write once a versioned schema is provided, however the following config must be enabled for errors to be thrown on read: `xpack.task_manager.allow_reading_invalid_state: true`. We plan to enable this in serverless by default but be cautious on existing deployments and wait for telemetry to show no issues. I've onboarded the `alerts_invalidate_api_keys` task type which can be used as an example to onboard others. See [this commit](214bae3). ### How to configure a task type to version and validate The structure is defined as: ``` taskManager.registerTaskDefinitions({ ... stateSchemaByVersion: { 1: { // All existing tasks don't have a version so will get `up` migrated to 1 up: (state: Record<string, unknown>) => ({ runs: state.runs || 0, total_invalidated: state.total_invalidated || 0, }), schema: schema.object({ runs: schema.number(), total_invalidated: schema.number(), }), }, }, ... }); ``` However, look at [this commit](214bae3) for an example that you can leverage type safety from the schema. ### Follow up issues - Onboard non-alerting task types to have a versioned state schema (#159342) - Onboard alerting task types to have a versioned state schema for the framework fields (#159343) - Onboard alerting task types to have a versioned rule and alert state schema within the task state (#159344) - Telemetry on the validation failures (#159345) - Remove feature flag so `allow_reading_invalid_state` is always `false` (#159346) - Force validation on all tasks using state by removing the exemption code (#159347) - Release tasks when encountering a validation failure after run (#159964) ### To Verify NOTE: I have the following verification scenarios in a jest integration test as well => https://github.com/elastic/kibana/pull/159048/files#diff-5f06228df58fa74d5a0f2722c30f1f4bee2ee9df7a14e0700b9aa9bc3864a858. You will need to log the state when the task runs to observe what the task runner receives in different scenarios. ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..4aa4c2c7805 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -140,6 +140,7 @@ function taskRunner( ) { return ({ taskInstance }: RunContext) => { const state = taskInstance.state as LatestTaskStateSchema; + console.log('*** Running task with the following state:', JSON.stringify(state)); return { async run() { let totalInvalidated = 0; ``` #### Scenario 1: Adding an unknown field to the task saved-object gets dropped 1. Startup a fresh Kibana instance 2. Make the following call to Elasticsearch (I used postman). This call adds an unknown property (`foo`) to the task state and makes the task run right away. ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{\"runs\":1,\"total_invalidated\":0,\"foo\":true}" } } } ``` 3. Observe the task run log message, with state not containing `foo`. #### Scenario 2: Task running returning an unknown property causes the task to fail to update 1. Apply the following changes to the code (and ignore TypeScript issues) ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..b15d4a4f478 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -183,6 +183,7 @@ function taskRunner( const updatedState: LatestTaskStateSchema = { runs: (state.runs || 0) + 1, + foo: true, total_invalidated: totalInvalidated, }; return { ``` 2. Make the task run right away by calling Elasticsearch with the following ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z" } } } ``` 3. Notice the validation errors logged as debug ``` [ERROR][plugins.taskManager] Task alerts_invalidate_api_keys "Alerts-alerts_invalidate_api_keys" failed: Error: [foo]: definition for this key is missing ``` #### Scenario 3: Task state gets migrated 1. Apply the following code change ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..338f21bed5b 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -41,6 +41,18 @@ const stateSchemaByVersion = { total_invalidated: schema.number(), }), }, + 2: { + up: (state: Record<string, unknown>) => ({ + runs: state.runs, + total_invalidated: state.total_invalidated, + foo: true, + }), + schema: schema.object({ + runs: schema.number(), + total_invalidated: schema.number(), + foo: schema.boolean(), + }), + }, }; const latestSchema = stateSchemaByVersion[1].schema; ``` 2. Make the task run right away by calling Elasticsearch with the following ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z" } } } ``` 3. Observe the state now contains `foo` property when the task runs. #### Scenario 4: Reading invalid state causes debug logs 1. Run the following request to Elasticsearch ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{}" } } } ``` 2. Observe the Kibana debug log mentioning the validation failure while letting the task through ``` [DEBUG][plugins.taskManager] [alerts_invalidate_api_keys][Alerts-alerts_invalidate_api_keys] Failed to validate the task's state. Allowing read operation to proceed because allow_reading_invalid_state is true. Error: [runs]: expected value of type [number] but got [undefined] ``` #### Scenario 5: Reading invalid state when setting `allow_reading_invalid_state: false` causes tasks to fail to run 1. Set `xpack.task_manager.allow_reading_invalid_state: false` in your kibana.yml settings 2. Run the following request to Elasticsearch ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{}" } } } ``` 3. Observe the Kibana error log mentioning the validation failure ``` [ERROR][plugins.taskManager] Failed to poll for work: Error: [runs]: expected value of type [number] but got [undefined] ``` NOTE: While corrupting the task directly is rare, we plan to re-queue the tasks that failed to read, leveraging work from #159302 in a future PR (hence why the yml config is enabled by default, allowing invalid reads). --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Ying Mao <[email protected]>

…ate objects (elastic#159048) Part of elastic#155764. In this PR, I'm modifying task manager to allow task types to report a versioned schema for the `state` object. When defining `stateSchemaByVersion`, the following will happen: - The `state` returned from the task runner will get validated against the latest version and throw an error if ever it is invalid (to capture mismatches at development and testing time) - When task manager reads a task, it will migrate the task state to the latest version (if necessary) and validate against the latest schema, dropping any unknown fields (in the scenario of a downgrade). By default, Task Manager will validate the state on write once a versioned schema is provided, however the following config must be enabled for errors to be thrown on read: `xpack.task_manager.allow_reading_invalid_state: true`. We plan to enable this in serverless by default but be cautious on existing deployments and wait for telemetry to show no issues. I've onboarded the `alerts_invalidate_api_keys` task type which can be used as an example to onboard others. See [this commit](elastic@214bae3). ### How to configure a task type to version and validate The structure is defined as: ``` taskManager.registerTaskDefinitions({ ... stateSchemaByVersion: { 1: { // All existing tasks don't have a version so will get `up` migrated to 1 up: (state: Record<string, unknown>) => ({ runs: state.runs || 0, total_invalidated: state.total_invalidated || 0, }), schema: schema.object({ runs: schema.number(), total_invalidated: schema.number(), }), }, }, ... }); ``` However, look at [this commit](elastic@214bae3) for an example that you can leverage type safety from the schema. ### Follow up issues - Onboard non-alerting task types to have a versioned state schema (elastic#159342) - Onboard alerting task types to have a versioned state schema for the framework fields (elastic#159343) - Onboard alerting task types to have a versioned rule and alert state schema within the task state (elastic#159344) - Telemetry on the validation failures (elastic#159345) - Remove feature flag so `allow_reading_invalid_state` is always `false` (elastic#159346) - Force validation on all tasks using state by removing the exemption code (elastic#159347) - Release tasks when encountering a validation failure after run (elastic#159964) ### To Verify NOTE: I have the following verification scenarios in a jest integration test as well => https://github.com/elastic/kibana/pull/159048/files#diff-5f06228df58fa74d5a0f2722c30f1f4bee2ee9df7a14e0700b9aa9bc3864a858. You will need to log the state when the task runs to observe what the task runner receives in different scenarios. ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..4aa4c2c7805 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -140,6 +140,7 @@ function taskRunner( ) { return ({ taskInstance }: RunContext) => { const state = taskInstance.state as LatestTaskStateSchema; + console.log('*** Running task with the following state:', JSON.stringify(state)); return { async run() { let totalInvalidated = 0; ``` #### Scenario 1: Adding an unknown field to the task saved-object gets dropped 1. Startup a fresh Kibana instance 2. Make the following call to Elasticsearch (I used postman). This call adds an unknown property (`foo`) to the task state and makes the task run right away. ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{\"runs\":1,\"total_invalidated\":0,\"foo\":true}" } } } ``` 3. Observe the task run log message, with state not containing `foo`. #### Scenario 2: Task running returning an unknown property causes the task to fail to update 1. Apply the following changes to the code (and ignore TypeScript issues) ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..b15d4a4f478 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -183,6 +183,7 @@ function taskRunner( const updatedState: LatestTaskStateSchema = { runs: (state.runs || 0) + 1, + foo: true, total_invalidated: totalInvalidated, }; return { ``` 2. Make the task run right away by calling Elasticsearch with the following ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z" } } } ``` 3. Notice the validation errors logged as debug ``` [ERROR][plugins.taskManager] Task alerts_invalidate_api_keys "Alerts-alerts_invalidate_api_keys" failed: Error: [foo]: definition for this key is missing ``` #### Scenario 3: Task state gets migrated 1. Apply the following code change ``` diff --git a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts index 1e624bcd807..338f21bed5b 100644 --- a/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts +++ b/x-pack/plugins/alerting/server/invalidate_pending_api_keys/task.ts @@ -41,6 +41,18 @@ const stateSchemaByVersion = { total_invalidated: schema.number(), }), }, + 2: { + up: (state: Record<string, unknown>) => ({ + runs: state.runs, + total_invalidated: state.total_invalidated, + foo: true, + }), + schema: schema.object({ + runs: schema.number(), + total_invalidated: schema.number(), + foo: schema.boolean(), + }), + }, }; const latestSchema = stateSchemaByVersion[1].schema; ``` 2. Make the task run right away by calling Elasticsearch with the following ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z" } } } ``` 3. Observe the state now contains `foo` property when the task runs. #### Scenario 4: Reading invalid state causes debug logs 1. Run the following request to Elasticsearch ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{}" } } } ``` 2. Observe the Kibana debug log mentioning the validation failure while letting the task through ``` [DEBUG][plugins.taskManager] [alerts_invalidate_api_keys][Alerts-alerts_invalidate_api_keys] Failed to validate the task's state. Allowing read operation to proceed because allow_reading_invalid_state is true. Error: [runs]: expected value of type [number] but got [undefined] ``` #### Scenario 5: Reading invalid state when setting `allow_reading_invalid_state: false` causes tasks to fail to run 1. Set `xpack.task_manager.allow_reading_invalid_state: false` in your kibana.yml settings 2. Run the following request to Elasticsearch ``` POST http://kibana_system:changeme@localhost:9200/.kibana_task_manager/_update/task:Alerts-alerts_invalidate_api_keys { "doc": { "task": { "runAt": "2023-06-08T00:00:00.000Z", "state": "{}" } } } ``` 3. Observe the Kibana error log mentioning the validation failure ``` [ERROR][plugins.taskManager] Failed to poll for work: Error: [runs]: expected value of type [number] but got [undefined] ``` NOTE: While corrupting the task directly is rare, we plan to re-queue the tasks that failed to read, leveraging work from elastic#159302 in a future PR (hence why the yml config is enabled by default, allowing invalid reads). --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Ying Mao <[email protected]> (cherry picked from commit 40c2afd)

Resolves #159347 Part of #155764 In this PR, I'm forcing validation on any tasks using state by ensuring a state schema is defined by the task type. Without this schema and once this PR merges, Task Manager will now fail validation by throwing `[TaskValidator] stateSchemaByVersion not defined for task type: XYZ` errors. This forces an explicit schema to be defined so we can properly handle state objects in ZDT environments. --------- Co-authored-by: Kibana Machine <[email protected]>

mikecote added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jun 8, 2023

mikecote self-assigned this Jun 8, 2023

mikecote added this to AppEx: ResponseOps - Execution & Connectors Jun 8, 2023

github-project-automation bot moved this to Awaiting Triage in AppEx: ResponseOps - Execution & Connectors Jun 8, 2023

mikecote mentioned this issue Jun 8, 2023

Add support to Task Manager for validating and versioning the task state objects #159048

Merged

ymao1 moved this from Awaiting Triage to Todo in AppEx: ResponseOps - Execution & Connectors Jun 15, 2023

mikecote mentioned this issue Jul 27, 2023

Version task-manager task state #155764

Open

mikecote mentioned this issue Aug 23, 2023

[Task Manager] Force validation on all tasks using state #164574

Merged

mikecote moved this from Todo to In Progress in AppEx: ResponseOps - Execution & Connectors Aug 23, 2023

mikecote moved this from In Progress to In Review in AppEx: ResponseOps - Execution & Connectors Sep 25, 2023

mikecote closed this as completed in #164574 Sep 26, 2023

github-project-automation bot moved this from In Review to Done in AppEx: ResponseOps - Execution & Connectors Sep 26, 2023

mikecote reopened this Oct 3, 2023

mikecote moved this from Done to In Progress in AppEx: ResponseOps - Execution & Connectors Oct 3, 2023

mikecote moved this from In Progress to Todo in AppEx: ResponseOps - Execution & Connectors Oct 16, 2023

mikecote added the response-ops-ec-backlog label Nov 1, 2024

mikecote removed the response-ops-ec-backlog label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task Manager] Force validation on all tasks using state by removing the exemption code #159347

[Task Manager] Force validation on all tasks using state by removing the exemption code #159347

mikecote commented Jun 8, 2023

elasticmachine commented Jun 8, 2023

[Task Manager] Force validation on all tasks using state by removing the exemption code #159347

[Task Manager] Force validation on all tasks using state by removing the exemption code #159347

Comments

mikecote commented Jun 8, 2023

elasticmachine commented Jun 8, 2023