-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Change rule creation for anomaly detection detectors (latency, throughput, and failed transaction rate) #126580
Comments
Pinging @elastic/apm-ui (Team:apm) |
@formgeist - could you please create a quick mock how alert creation would look like in both of these flows, i.e. a single anomaly detection rule that will contain all three options as conditions or have separate rule types for each detector? |
@alex-fedotyev I've updated the issue description with the two options. I'm heavily leaning towards Solution B because it means the most flexibility in its use and being able to set a global rule for all anomalies that are detected. @dgieselaar are there any limitations or challenges with either approach? |
Solution B sound good to me. Thanks for getting around to this so quickly @formgeist ! |
@vinaychandrasekhar do you have any thoughts on this? |
@chrisdistasio thanks for including me in the discussion.
|
+1 @vinaychandrasekhar -- we should use this as an opty to drive consistency in behavior/pattern across the o11y apps. |
@formgeist agree w/ @sqren that B sounds like our best option. It'd be great if we can get that in for 8.2, as it fixes the bug (or rather makes that behaviour explicit) where it fires for all detector types. |
We originally intended the rule grouping to reflect the fact that Latency had anomaly and threshold rules types, while the others (throughput and failed transaction rate) only supported threshold based rules. Now that this is no longer the issue, I agree a re-organization makes sense.
Agreed, there's an opportunity to review this across Observability too. I've created a separate issue for this. I would like us to focus on supporting the new anomaly detectors in the upcoming release, so let's narrow the scope down to change the anomaly rule creation. I'll create the necessary ticket(s) so we can include this in our plans. cc @dannycroft |
Created a related issue to change the structure of rules in the Alerts and rules option for APM #126757 |
Hi, I'm going to start working on this. For me to understand and to confirm as the issue is quite all Do we change also this copy? |
Yes, that sounds good, adding a detector for the different available metrics and selecting the all by default. |
@MiriamAparicio I suggest making it a checkbox (multiselect). Let's avoid the "All" option and instead default to having all three options pre-selected. |
@sqren I already have a PR up, and what you suggested would be very different to what we have for all other rules |
Summary
The current experience around creating anomaly detection rules for alerts is targeted only for the latency detector, but we have recently added new detectors for the anomaly detection jobs that include throughput and failed transaction rate.
We should decide whether we should include a single anomaly detection rule that will contain all three options as conditions or have separate rule types for each detector.
Solution
Convert the existing latency anomaly rule to a global anomaly rule for all or a single detector(s)
$service.name
"The text was updated successfully, but these errors were encountered: