Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poc backfill actions #199218

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

Poc backfill actions #199218

wants to merge 15 commits into from

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Nov 6, 2024

Summary

Based on the discussion in https://github.com/elastic/security-team/issues/10824, this POC adds a flag to the schedule backfill API for whether to run actions during the backfill.

Supported action configurations:

  • for each alert, every rule run
  • alert summary, every rule run
  • if the alert matches a query
  • if the alert is generated during a time frame
  • system actions

Unsupported action configurations:

  • alert summary, on a custom interval

Implementation notes:

  • Actions scheduled from backfill runs are run at a lower priority than actions scheduled during normal rule execution.
  • The actions and action messages in the rule configuration at the time the backfill is scheduled is used. Any changes to the rule action configuration or action message that are made after the backfill is scheduled will not be respected (this is the same as changes to rule parameter configuration).
  • Rules with unsupported actions (alert summary on custom interval) will return an error during scheduling (we can change this to ignore the unsupported action if desired).

@elasticmachine
Copy link
Contributor

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!

@pmuellr
Copy link
Member

pmuellr commented Nov 14, 2024

It looks like the routes are like this:

read   - GET    /internal/alerting/rules/backfill/{id}
delete - DELETE /internal/alerting/rules/backfill/{id}
find   - POST   /internal/alerting/rules/backfill/_find&end=&page=&per_page=&rule_ids=&start=&sort_field=&sort_order=
create - POST   /internal/alerting/rules/backfill/_schedule
[
  {
    rule_id: string
    start: string
    end?: string
    run_actions: boolean
  }
]

There's "path contention" between the POST and GET and DELETE paths, if {id} is ever allowed to be_schedule or _find. However, not really a problem now since these are differentiated via the methods, and we create the id's. Today. We know how that story progresses! (customers like to be able to create their own ids).

What if we added an "update" route? You'd think it would be the following, but then we'd have a problem IF we allowed users to create arbitrary ids and they used _schedule or _find.

POST  /internal/alerting/rules/backfill/{id}

I don't think this is a big deal, not sure it's worth making our URLs even longer :-). But perhaps we can add a comment where we create the UUID to briefly explain that due to the URL structure, the id's MUST be at least legal UUIDs (v4?) and not arbitrary strings. So we don't "accidently" have a URL collision later.

@approksiu
Copy link

@ymao1 re "Rules with unsupported actions (alert summary on custom interval) will return an error during scheduling (we can change this to ignore the unsupported action if desired)." We'd like to schedule other actions for these rules, and inform users that alert summary on custom interval will not be executed for the manual runs. cc @ARWNightingale

@ymao1
Copy link
Contributor Author

ymao1 commented Nov 18, 2024

@ymao1 re "Rules with unsupported actions (alert summary on custom interval) will return an error during scheduling (we can change this to ignore the unsupported action if desired)." We'd like to schedule other actions for these rules, and inform users that alert summary on custom interval will not be executed for the manual runs. cc @ARWNightingale

@approksiu Sure, I can look into returning some sort of warning reason while still allowing the backfill to be scheduled.

{
type: 'mappings_addition',
addedMappings: {
priority: { type: 'integer' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/ this move will probably help us with #194627. Would we / could we start persisting priority for all tasks on update and eventually more to a non-scripted sort in the task claimer?

Comment on lines +131 to +133
if (doc['task.priority'].size() != 0) {
return doc['task.priority'].value;
} else if (params.priority_map.containsKey(taskType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the values for TaskPriority

export enum TaskPriority {
  Low = 1,
  Normal = 50,
}

Before we start persisting them in the implementation PR, I wonder if we should review the values / order. I think 1 and 50 are ok, but I wonder if 1 should represent top priority and 50 being lower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either would work! The way it currently is, we would sort by priority=descending which makes more sense in my head but definitely an argument for the reverse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants