-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Storing custom searchable rule execution data inside the rule #112193
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
@elastic/kibana-alerting-services we need some 👀 from your side on this one and we'd like to start discussing options for how to implement this. In Security we have capacity for working on the implementation and ideally we need it in 7.16 and as soon as possible, so we could finalize the whole #101013 in 7.16 as well 😎 |
We had mentioned in talking about this issue that we track the "schedule delay" of rule/connector execution ( Also note that the POC for this issue adds the rule execution duration to the rule execution status in the SO - #111805 Some notes on suggested fields:
Options on extendability; below, some of the options would create a new "generic" field in
Option 1 provides the best story from a lot of aspects, but does bloat the mappings, whereas the other options don't, as much. We'd want to define how the migrations for these work; I'm thinking a migration would never add values for these, but if a field was removed in some release, we'd clearly have to delete it in a migration. Meaning rule solution authors having to update the migration in the For option 2, to search/sort, runtime fields would need to be provided, that would presumably be accessing the source, so is likely the "slowest" story for queries. For option 3, doesn't allow numeric access, just keyword, so would be problematic for numeric fields, which I assume will be pretty important. Presumably the numeric fields could be accessed via runtime fields, not sure if the source is required. For option 4, the way we'd store values is obviously clumsy, but we're essentially modelling For option 5, I believe we've looked at doing this, perhaps as part of the "searchable rule params" issue we worked on a while back (will need to look for the issue) - but the TL;DR is we settled on using the flattened type for this (which doesn't seem appropriate here with numeric values). |
Side note, I don't believe free text searching can be done on runtime fields, but I may be wrong. |
I believe you are correct, but I'm guessing this isn't a concern. It will contain output from the execution, so perhaps some error messages would be text search candidates, but guessing not being able to text search through them is survivable. |
One point that came when Mike and I chatted about this was whether you can use runtime fields in SO searches. Dunno! |
In a chat with Core, it appears there isn't a way to send runtime field definitions with SO I hadn't considered that, but it makes sense to "burn" runtime fields into the mappings for this, as presumably they wouldn't change during a particular Kibana release. |
If they are part of the mapping, would the runtime fields get "calculated" anytime a search is happening by Kibana? Including different SO types? |
Good question we should ask ES folks. Thinking about this some more, it feels like this could end up being another form of "migration" that we need to deal with over time, and one that's different from our current migrations, so ... will add some amount of additional complexity, just thinking about how to keep such fields "stable" over time. Probably easier than our existing migrations, since we don't migrate data, but we can also never change existing mappings for old indices. Feeling like depending on runtime fields should probably have a small research spike associated with it, if we wanted to go that route. |
Mentioned is a discussion: we should add the number of alerts generated from a rule execution to the execution status. And the event log as well? We can also provide the number of new and recovered alerts. And by "alerts", I mean "instanceIds", and it would be the number that are "active", not the number that are actually going to schedule actions to be fired. I think the number of actions that are scheduled to be fired would be another interesting number to track. |
Thought I'd point out the place where the execution status is written to the rule SO: kibana/x-pack/plugins/alerting/server/task_runner/task_runner.ts Lines 603 to 618 in b1d6779
The executionStatus code itself is here: https://github.com/elastic/kibana/blob/master/x-pack/plugins/alerting/server/lib/alert_execution_status.ts |
Hey all, I'll be honest with you, I'm not sure we have the headspace on our side to really fully understand what we're looking at here or how we can give useful feedback. I think the overall idea makes sense to me and if/when we come across the need to use this, I'm confident we'll be able to adapt things so that it works for us. Do you have specific questions or areas that you think would be good for observability to address for this initial implementation? |
Understandable, and definitely agree about being able to adapt as needed, so I think we're all good there 👍. This effort is in support of the Rule Management/Monitoring redesign (https://github.com/elastic/stack-design-team/issues/68), and I think we're mostly just looking for sign-off from Observability on these fields in support of that effort since you all will be adopting the Rule Management here at some point. So more of a verification that the proposed UI/UX meets your needs and that we don't need to add any additional fields as part of this initial effort (can always add more later :). |
Let me try to summarize what we have at this point after the discussions. TL;DR:
Fields
ImplementationWe discussed the options suggested by Patrick:
|
One of the options discussed here was the thought of using runtime fields to allow rule-specific fields to be extracted from the rule saved object. Turns out Kibana doesn't support passing runtime field mappings into SO search requests, so I've opened an issue to request that - add support for runtime fields in Saved Object find() requests #113152. |
Mike had proposed an option 6: wait for elasticsearch "join" capability in search, at which point we could do a query to join rules against some other SO containing the rule-specific data. May be a long wait :-). But honestly, that would be the cleanest solution. And we'd need to wait for SO's to support "join" as well ... Options 2 and 3 require SO support runtime fields in the SO client For Option 4, I'll have to double-check on nested fields, but I think you can use them in KQL, but can't access them as fields for use in Lens. Which might be fine, because most users aren't going to need to build Lens graphs over For option 5, we'd likely need Kibana core to help out with this. Rather than invent some new thing on top of SO's specific to alerting rules, it would probably be better to extend SO's somehow to provide this capability. |
Parent ticket: #101013
Related to: #91265, #106347
Why
Security Solution needs to migrate its rule execution statuses and metrics from our sidecar SO to event_log (see the parent ticket). Given the rule execution events are in event_log, we will need to be able to execute relatively complex queries to event_log in order to fetch data for the Rule Monitoring table in Security.
Examples of such queries with aggregations:
We identified that queries with aggregations mentioned above are not performant.
What
We could simplify many of the mentioned queries and improve performance by duplicating information from the last execution and storing it inside the rule itself (or close to it). Additionally, this would allow us to postpone the implementation of a custom RBAC for event-log-via-alerting (see #106347) until after 7.16.
What fields we’d like to store:
status
going to run
,succeeded
,partial failure
,failed
status_order
0
,10
,20
,30
status
(custom order instead of the alphabetical order)metrics.indexing_duration_sum_ms
1200
metrics.search_duration_sum_ms
1200
metrics.execution_gap_duration_s
42000
Execution gap metric that we use in Security Solution is defined the following way:
This looks very much similar to the existing rule state, with the difference that:
How
Some ideas discussed between @pmuellr, @spong, @xcrzx and @banderror:
The text was updated successfully, but these errors were encountered: