[APM] Change rule creation for anomaly detection detectors (latency, throughput, and failed transaction rate) #126580

formgeist · 2022-03-01T12:47:14Z

Summary

The current experience around creating anomaly detection rules for alerts is targeted only for the latency detector, but we have recently added new detectors for the anomaly detection jobs that include throughput and failed transaction rate.

We should decide whether we should include a single anomaly detection rule that will contain all three options as conditions or have separate rule types for each detector.

Solution

Convert the existing latency anomaly rule to a global anomaly rule for all or a single detector(s)

Remove/replace the existing latency anomaly rule with a global anomaly rule
The new anomaly rule takes a condition of all or a single detector, so the user doesn't have to choose individual rule types for the various anomalies that might occur.
By enabling the user to selected a single detector we keep the existing functionality of being able to select e.g. latency as the detector and determining how to alert on any severity level.
By enabling the user to choose actions on all detector anomalies with the same severity level enables them to have fewer rules that serve as a catch-all
Add detector ("metric") option to the conditions of the rule and replace wording around latency anomaly.
Update the default rule name to "Anomaly alert | $service.name"

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-03-01T12:47:15Z

Pinging @elastic/apm-ui (Team:apm)

alex-fedotyev · 2022-03-02T02:49:08Z

@formgeist - could you please create a quick mock how alert creation would look like in both of these flows, i.e. a single anomaly detection rule that will contain all three options as conditions or have separate rule types for each detector?

formgeist · 2022-03-02T10:46:17Z

@alex-fedotyev I've updated the issue description with the two options. I'm heavily leaning towards Solution B because it means the most flexibility in its use and being able to set a global rule for all anomalies that are detected.

@dgieselaar are there any limitations or challenges with either approach?

cc @sqren @chrisdistasio

sorenlouv · 2022-03-02T18:50:38Z

Solution B sound good to me. Thanks for getting around to this so quickly @formgeist !

chrisdistasio · 2022-03-02T19:27:03Z

@vinaychandrasekhar do you have any thoughts on this?

vinaychandrasekhar · 2022-03-02T20:43:49Z

@chrisdistasio thanks for including me in the discussion.
@formgeist couple of questions -

Would it make sense to also combine the multiple threshold rules under a single "Create threshold rule" with a second level menu for failed transaction rule, error count rule? Otherwise, it appears confusing (to me) that anomaly rule creation shows no notion of signal or metric (i.e., latency, transaction rate, errors) where as the threshold ones do
Other o11y apps where the alerts and rules drop down is shown seem to follow a pattern more similar to solution A above. Would like to check if there's a top-down pattern consistency already considered across o11y apps?

chrisdistasio · 2022-03-02T21:34:55Z

+1 @vinaychandrasekhar -- we should use this as an opty to drive consistency in behavior/pattern across the o11y apps.

dgieselaar · 2022-03-02T23:46:28Z

@formgeist agree w/ @sqren that B sounds like our best option. It'd be great if we can get that in for 8.2, as it fixes the bug (or rather makes that behaviour explicit) where it fires for all detector types.

formgeist · 2022-03-03T09:13:06Z

Would it make sense to also combine the multiple threshold rules under a single "Create threshold rule" with a second level menu for failed transaction rule, error count rule? Otherwise, it appears confusing (to me) that anomaly rule creation shows no notion of signal or metric (i.e., latency, transaction rate, errors) where as the threshold ones do

We originally intended the rule grouping to reflect the fact that Latency had anomaly and threshold rules types, while the others (throughput and failed transaction rate) only supported threshold based rules. Now that this is no longer the issue, I agree a re-organization makes sense.

Other o11y apps where the alerts and rules drop down is shown seem to follow a pattern more similar to solution A above. Would like to check if there's a top-down pattern consistency already considered across o11y apps?

Agreed, there's an opportunity to review this across Observability too. I've created a separate issue for this.

I would like us to focus on supporting the new anomaly detectors in the upcoming release, so let's narrow the scope down to change the anomaly rule creation. I'll create the necessary ticket(s) so we can include this in our plans.

cc @dannycroft

formgeist · 2022-03-03T09:43:33Z

Created a related issue to change the structure of rules in the Alerts and rules option for APM #126757

MiriamAparicio · 2023-11-21T10:49:48Z

Hi, I'm going to start working on this. For me to understand and to confirm as the issue is quite all
We want to add the 'metric' detector to the creation of the rule, keeping as default ALL, and display suggestions as what we have for the type with the 3 different connectors (latency, throughput, or failed transaction rate)

Do we change also this copy?
Alert when either the latency, throughput, or failed transaction rate of a service is anomalous. Learn more

cc @formgeist @boriskirov @sqren

boriskirov · 2023-11-23T09:41:55Z

Yes, that sounds good, adding a detector for the different available metrics and selecting the all by default.

sorenlouv · 2023-11-24T08:43:53Z

We want to add the 'metric' detector to the creation of the rule, keeping as default ALL, and display suggestions as what we have for the type with the 3 different connectors (latency, throughput, or failed transaction rate)

@MiriamAparicio I suggest making it a checkbox (multiselect). Let's avoid the "All" option and instead default to having all three options pre-selected.

MiriamAparicio · 2023-11-25T07:53:11Z

@sqren I already have a PR up, and what you suggested would be very different to what we have for all other rules

Closes #126580 https://github.com/elastic/kibana/assets/31922082/6ff92aec-87ef-4bf3-94f5-f0820c4033c8

Closes elastic#126580 https://github.com/elastic/kibana/assets/31922082/6ff92aec-87ef-4bf3-94f5-f0820c4033c8

formgeist added Team:APM All issues that need APM UI Team support enhancement New value added to drive a business result labels Mar 1, 2022

formgeist added the apm:alerting label Mar 1, 2022

formgeist mentioned this issue Mar 3, 2022

[APM] Reorganize rule types in the Alerts and rules dropdown option #126757

Closed

3 tasks

formgeist mentioned this issue Mar 4, 2022

[APM] Add rule types to the general list of rules available #126909

Closed

sorenlouv added the apm:ml Integration between APM and ML label Jan 17, 2023

MiriamAparicio self-assigned this Nov 20, 2023

MiriamAparicio mentioned this issue Nov 24, 2023

[APM] Add detectors for anomaly rules creation #171901

Merged

MiriamAparicio closed this as completed in #171901 Jan 8, 2024

MiriamAparicio added a commit that referenced this issue Jan 8, 2024

[APM] Add detectors for anomaly rules creation (#171901)

316858f

Closes #126580 https://github.com/elastic/kibana/assets/31922082/6ff92aec-87ef-4bf3-94f5-f0820c4033c8

delanni pushed a commit to delanni/kibana that referenced this issue Jan 11, 2024

[APM] Add detectors for anomaly rules creation (elastic#171901)

3769ee4

Closes elastic#126580 https://github.com/elastic/kibana/assets/31922082/6ff92aec-87ef-4bf3-94f5-f0820c4033c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[APM] Change rule creation for anomaly detection detectors (latency, throughput, and failed transaction rate) #126580

[APM] Change rule creation for anomaly detection detectors (latency, throughput, and failed transaction rate) #126580

formgeist commented Mar 1, 2022 •

edited

Loading

elasticmachine commented Mar 1, 2022

alex-fedotyev commented Mar 2, 2022

formgeist commented Mar 2, 2022

sorenlouv commented Mar 2, 2022 •

edited

Loading

chrisdistasio commented Mar 2, 2022

vinaychandrasekhar commented Mar 2, 2022

chrisdistasio commented Mar 2, 2022

dgieselaar commented Mar 2, 2022

formgeist commented Mar 3, 2022

formgeist commented Mar 3, 2022

MiriamAparicio commented Nov 21, 2023

boriskirov commented Nov 23, 2023

sorenlouv commented Nov 24, 2023

MiriamAparicio commented Nov 25, 2023

[APM] Change rule creation for anomaly detection detectors (latency, throughput, and failed transaction rate) #126580

[APM] Change rule creation for anomaly detection detectors (latency, throughput, and failed transaction rate) #126580

Comments

formgeist commented Mar 1, 2022 • edited Loading

Summary

Solution

Convert the existing latency anomaly rule to a global anomaly rule for all or a single detector(s)

elasticmachine commented Mar 1, 2022

alex-fedotyev commented Mar 2, 2022

formgeist commented Mar 2, 2022

sorenlouv commented Mar 2, 2022 • edited Loading

chrisdistasio commented Mar 2, 2022

vinaychandrasekhar commented Mar 2, 2022

chrisdistasio commented Mar 2, 2022

dgieselaar commented Mar 2, 2022

formgeist commented Mar 3, 2022

formgeist commented Mar 3, 2022

MiriamAparicio commented Nov 21, 2023

boriskirov commented Nov 23, 2023

sorenlouv commented Nov 24, 2023

MiriamAparicio commented Nov 25, 2023

formgeist commented Mar 1, 2022 •

edited

Loading

sorenlouv commented Mar 2, 2022 •

edited

Loading