Skip to content

Commit

Permalink
Align formatting of APM rules with other rules docs (#4366)
Browse files Browse the repository at this point in the history
* split apm alerts and format like other rule docs

* fill out individual apm rule pages

* address feedback

* apm ui -> applications ui

---------

Co-authored-by: bmorelli25 <[email protected]>
  • Loading branch information
colleenmcginnis and bmorelli25 authored Oct 25, 2024
1 parent ec217ce commit 49453be
Show file tree
Hide file tree
Showing 29 changed files with 601 additions and 186 deletions.
170 changes: 0 additions & 170 deletions docs/en/observability/apm-alerts.asciidoc

This file was deleted.

91 changes: 91 additions & 0 deletions docs/en/observability/apm-anomaly-rule.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
[[apm-anomaly-rule]]
= APM Anomaly rule

APM Anomaly rules trigger when the latency, throughput, or failed transaction rate of a service is abnormal.

[discrete]
[[apm-anomaly-rule-filters-conditions]]
== Filters and conditions

Because some parts of an application may be more important than others, you might have a different tolerance
for abnormal performance across services in your application. You can filter the services in your application to
apply an APM Anomaly rule to specific services (`SERVICE`), transaction types (`TYPE`), and environments (`ENVIRONMENT`).

Then, you can specify which conditions should result in an alert. This includes specifying:

* The types of anomalies that are detected (`DETECTOR TYPES`): `latency`, `throughput`, and/or `failed transaction rate`.
* The severity level (`HAS ANOMALY WITH SEVERITY`): `critical`, `major`, `minor`, `warning`.

.Example
****
This example creates a rule for all production services that would result in an alert when a critical latency
anomaly is detected:
image::apm-anomaly-rule-filters-conditions.png[width=600]
****

[discrete]
== Rule schedule

include::../shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc[]

[discrete]
== Advanced options

include::../shared/alerting-and-rules/generic-apm-advanced-options.asciidoc[]

[discrete]
== Actions

Extend your rules by connecting them to actions that use built-in integrations.

[discrete]
=== Action types

Supported built-in integrations include:

include::../shared/alerting-and-rules/alerting-connectors.asciidoc[]

[discrete]
=== Action frequency

include::../shared/alerting-and-rules/generic-apm-action-frequency.asciidoc[]

[discrete]
[[apm-anomaly-rule-action-variables]]
=== Action variables

A default message is provided as a starting point for your alert.
If you want to customize the message, add more context to the message by clicking the icon above
the message text box and selecting from a list of available variables.

TIP: To add variables to alert messages, use https://mustache.github.io/[Mustache] template syntax, for example `{{variable.name}}`.

image::apm-anomaly-rule-action-variables.png[width=600]

The following variables are specific to this rule type.
You an also specify {kibana-ref}/rule-action-variables.html[variables common to all rules].

`context.alertDetailsUrl`::
Link to the alert troubleshooting view for further context and details. This will be an empty string if the server.publicBaseUrl is not configured.

`context.environment`::
The transaction type the alert is created for.

`context.reason`::
A concise description of the reason for the alert.

`context.serviceName`::
The service the alert is created for.

`context.threshold`::
Any trigger value above this value will cause the alert to fire.

`context.transactionType`::
The transaction type the alert is created for.

`context.triggerValue`::
The value that breached the threshold and triggered the alert.

`context.viewInAppUrl`::
Link to the alert source.
121 changes: 121 additions & 0 deletions docs/en/observability/apm-error-count-threshold-rule.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
[[apm-error-count-threshold-rule]]
= Error count threshold rule

Alert when the number of errors in a service exceeds a defined threshold. Error count rules can be set at the
environment level, service level, and error group level.

[discrete]
[[apm-error-count-threshold-rule-filters-conditions]]
== Filters and conditions

Filter the errors coming from your application to apply an Error count threshold rule to a specific
service (`SERVICE`), environment (`ENVIRONMENT`) or error grouping key (`ERROR GROUPING KEY`).
Alternatively, you can use a {kibana-ref}/kuery-query.html[KQL filter] to limit the scope of the alert
by toggling on the *Use KQL Filter* option.

[TIP]
====
Similar errors are grouped together to make it easy to quickly see which errors are affecting your services and to take actions to rectify them. Each group of errors has a unique _error grouping key_ — a hash of the stack trace and other properties.
====

Then, you can specify which conditions should result in an alert. This includes specifying:

* The number of errors that occurred (`IS ABOVE`).
* The timeframe in which the errors must occur (`FOR THE LAST`) in seconds, minutes, hours, or days.

.Example
****
This example creates a rule for all production services that would result in an alert when there are 25 errors
in the last five minutes:
image::apm-error-count-rule-filters-conditions.png[width=600]
Alternatively, you can use a KQL filter to limit the scope of the alert:
. Toggle on *Use KQL Filter*.
. Add a filter:
+
[source,txt]
------
service.environment:"Production"
------
****

[discrete]
== Groups

include::../shared/alerting-and-rules/generic-apm-group-by.asciidoc[]

[discrete]
== Rule schedule

include::../shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc[]

[discrete]
== Advanced options

include::../shared/alerting-and-rules/generic-apm-advanced-options.asciidoc[]

[discrete]
== Actions

Extend your rules by connecting them to actions that use built-in integrations.

[discrete]
=== Action types

Supported built-in integrations include:

include::../shared/alerting-and-rules/alerting-connectors.asciidoc[]

[discrete]
=== Action frequency

include::../shared/alerting-and-rules/generic-apm-action-frequency.asciidoc[]

[discrete]
=== Action variables

A default message is provided as a starting point for your alert.
If you want to customize the message, add more context to the message by clicking the icon above
the message text box and selecting from a list of available variables.

TIP: To add variables to alert messages, use https://mustache.github.io/[Mustache] template syntax, for example `{{variable.name}}`.

image::apm-error-count-rule-action-variables.png[width=600]

The following variables are specific to this rule type.
You an also specify {kibana-ref}/rule-action-variables.html[variables common to all rules].

`context.alertDetailsUrl`::
Link to the alert troubleshooting view for further context and details. This will be an empty string if the server.publicBaseUrl is not configured.

`context.environment`::
The transaction type the alert is created for

`context.errorGroupingKey`::
The error grouping key the alert is created for

`context.errorGroupingName`::
The error grouping name the alert is created for

`context.interval`::
The length and unit of the time period where the alert conditions were met

`context.reason`::
A concise description of the reason for the alert

`context.serviceName`::
The service the alert is created for

`context.threshold`::
Any trigger value above this value will cause the alert to fir

`context.transactionName`::
The transaction name the alert is created for

`context.triggerValue`::
The value that breached the threshold and triggered the alert

`context.viewInAppUrl`::
Link to the alert source
Loading

0 comments on commit 49453be

Please sign in to comment.