Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update alerting terminology #679

Merged
merged 2 commits into from
May 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions docs/en/observability/configure-uptime-settings.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
= Configure settings

The *Settings* page enables you to change which {heartbeat} indices are displayed
by the {uptime-app}, configure alert connectors, and set expiration/age thresholds
by the {uptime-app}, configure rule connectors, and set expiration/age thresholds
for TLS certificates.

Uptime settings apply to the current space only. To segment
Expand Down Expand Up @@ -32,34 +32,34 @@ data outside of this pattern.
image::images/heartbeat-indices.png[Heartbeat indices]

[[configure-uptime-alert-connectors]]
== Configure alert connectors
== Configure connectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think this works well until Uptime migrates away from the deprecated alerting API. There's an issue in Kibana somewhere...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know.


*Alerts* work by running checks on a schedule to detect conditions. When a condition is met, the alert tracks
it as an *alert instance* and responds by triggering one or more *actions*.
*Alerts* work by running checks on a schedule to detect conditions defined by a rule. When a condition is met,
the rule tracks it as an *alert instance* and responds by triggering one or more *actions*.
Actions typically involve interaction with {kib} services or third party integrations. *Connectors* allow actions
to talk to these services and integrations.

Click *Create connector* and follow the prompts to select a connector type and configure its properties.
After you create a connector, it's available to you anytime you set up an alert action in the current space.
After you create a connector, it's available to you anytime you set up a rule action in the current space.

For more information about each connector, see {kibana-ref}/action-types.html[action types and connectors].

[role="screenshot"]
image::images/alert-connector.png[Alert connector]
image::images/alert-connector.png[Rule connector]

[[configure-cert-thresholds]]
== Configure certificate thresholds

You can modify certificate thresholds to control how Uptime displays your TLS values in
the <<view-certificate-status,Certificates>> page. These settings also determine which certificates are
selected by any TLS alert you create.
selected by any TLS rule you create.

|===

| *Expiration threshold* | The `expiration` threshold specifies when you are notified
about certificates that are approaching expiration dates. When the value of a certificate's remaining valid days falls
below the `Expiration threshold`, it's considered a warning state. When you define a
<<tls-certificate-alert,TLS alert>>, you receive a notification about the certificate.
<<tls-certificate-alert,TLS rule>>, you receive a notification about the certificate.

| *Age limit* | The `age` threshold specifies when you are notified about certificates
that have been valid for too long.
Expand Down
37 changes: 24 additions & 13 deletions docs/en/observability/create-alerts.asciidoc
Original file line number Diff line number Diff line change
@@ -1,21 +1,32 @@
[[create-alerts]]
= Alerting

Alerting allows you to detect complex conditions within the Logs, Metrics, and Uptime apps
and trigger actions when those conditions are met. Alerting can be centrally managed from
the {kibana-ref}/alert-management.html[{kib} Management UI] and provides
a set of built-in {kibana-ref}/action-types.html[actions] and alert-types for you to use.

Extend your alerts by connecting them to actions that use built-in integrations for email,
[IMPORTANT]
====
Make sure to set up alerting in {kib}. For details, see
{kibana-ref}/alerting-getting-started.html#alerting-setup-prerequisites[Setup and prerequisites].
====

Within the Logs, Metrics, and Uptime apps, alerting enables you to detect
complex *conditions* defined by a *rule*. When a condition is met, the rule
tracks it as an *alert* and responds by triggering one or more *actions*.

Alerting can be centrally managed from the
{kibana-ref}/alert-management.html[{kib} Management UI] and provides a
set of built-in {kibana-ref}/defining-alerts.html[rule types] and
{kibana-ref}/action-types.html[connectors] for you to use.

Extend your rules by connecting them to actions that use built-in *connectors* for email,
IBM Resilient, Index, JIRA, Microsoft Teams, PagerDuty, Server log, ServiceNow ITSM, and Slack.
Also supported is a powerful webhook output letting you tie into other third-party systems.

* <<logs-threshold-alert,Logs threshold alert>>
* <<infrastructure-threshold-alert,Infrastructure threshold alert>>
* <<metrics-threshold-alert,Metrics threshold alert>>
* <<monitor-status-alert,Monitor status alert>>
* <<tls-certificate-alert,TLS certificate alert>>
* <<duration-anomaly-alert,Uptime duration anomaly alert>>
Connectors allow actions to talk to these services and integrations.

* <<logs-threshold-alert,Logs threshold rule>>
* <<infrastructure-threshold-alert,Infrastructure threshold rule>>
* <<metrics-threshold-alert,Metrics threshold rule>>
* <<monitor-status-alert,Monitor status rule>>
* <<tls-certificate-alert,TLS certificate rule>>
* <<duration-anomaly-alert,Uptime duration anomaly rule>>

include::logs-threshold-alert.asciidoc[leveloffset=+1]

Expand Down
54 changes: 27 additions & 27 deletions docs/en/observability/infrastructure-threshold-alert.asciidoc
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
[[infrastructure-threshold-alert]]
= Create an infrastructure threshold alert
= Create an infrastructure threshold rule

Based on the resources listed on the *Inventory* page within the {metrics-app},
you can create a threshold alert to notify you when a metric has reached or exceeded a value for a specific
you can create a threshold rule to notify you when a metric has reached or exceeded a value for a specific
resource or a group of resources within your infrastructure.

Additionally, each alert can be defined using multiple
Additionally, each rule can be defined using multiple
conditions that combine metrics and thresholds to create precise notifications and reduce false positives.

. To access this page, go to *Observability > Metrics*.
Expand All @@ -15,35 +15,35 @@ conditions that combine metrics and thresholds to create precise notifications a
[TIP]
==============================================
When you select *Create inventory alert*, the parameters you configured on the *Inventory* page will automatically
populate the alert. You can use the Inventory first to view which nodes in your infrastructure you'd
like to be notified about and then quickly create an alert in just a few clicks.
populate the rule. You can use the Inventory first to view which nodes in your infrastructure you'd
like to be notified about and then quickly create a rule in just a few clicks.
==============================================

[[inventory-conditions]]
== Inventory conditions

Conditions for each alert can be applied to specific metrics relating to the inventory type you select.
Conditions for each rule can be applied to specific metrics relating to the inventory type you select.
You can choose the aggregation type, the metric, and by including a warning threshold value, you can be
alerted on multiple threshold values based on severity scores. When creating the alert, you can still get
notified if no data is returned for the specific metric or if the alert fails to query {es}.
alerted on multiple threshold values based on severity scores. When creating the rule, you can still get
notified if no data is returned for the specific metric or if the rule fails to query {es}.

In this example, Kubernetes Pods is the selected inventory type. The conditions state that you will receive
a critical alert for any pods within the `ingress-nginx` namespace with a memory usage of 95% or above
and a warning alert if memory usage is 90% or above.

[role="screenshot"]
image::images/inventory-alert.png[Inventory alert]
image::images/inventory-alert.png[Inventory rule]

Before creating an alert, you can preview whether the conditions would have triggered the alert in the last
Before creating a rule, you can preview whether the conditions would have triggered the alert in the last
hour, day, week, or month.

[role="screenshot"]
image::images/alert-preview.png[Preview alerts]
image::images/alert-preview.png[Preview rules]

[[action-types-infrastructure]]
== Action types

You can extend your alerts by connecting them to actions that use the following supported built-in integrations.
You can extend your rules by connecting them to actions that use the following supported built-in integrations.

[role="screenshot"]
image::images/alert-action-types.png[Action types]
Expand All @@ -56,40 +56,40 @@ image::images/run-when-selection.png[Configure when an alert is triggered]

== Action variables

This section details the variables that infrastructure threshold alerts will send to your actions.
This section details the variables that infrastructure threshold rules will send to your actions.

[float]
=== Basic variables

[role="screenshot"]
image::images/basic-variables.png[The default infrastructure threshold alert message detailing basic variables]
image::images/basic-variables.png[The default infrastructure threshold rule message detailing basic variables]

The default message for an infrastructure threshold alert displays the basic variables you can use.
The default message for an infrastructure threshold rule displays the basic variables you can use.

- `context.group`: This variable resolves to the **group** that the alert's condition detected. For inventory alerts,
this is the name of a monitored host, pod, container, etc. For metric threshold alerts, this is the value of the field
specified in the **Create alert per** field or `*` if the alert is configured to aggregate your entire infrastructure.
- `context.group`: This variable resolves to the **group** that the rule conditions detected. For inventory rules,
this is the name of a monitored host, pod, container, etc. For metric threshold rules, this is the value of the field
specified in the **Create alert per** field or `*` if the rule is configured to aggregate your entire infrastructure.
- `context.alertState`: Depending on why the action is triggered, this variable resolves to `ALERT`, `NO DATA`, or
`ERROR`. `ALERT` means the alert's condition is detected, `NO DATA` means that no data was returned for the time period
`ERROR`. `ALERT` means the rule condition is detected, `NO DATA` means that no data was returned for the time period
that was queried, and `ERROR` indicates an error when querying the data.
- `context.reason`: This variable describes why the alert is in its current state. For each of the alert’s conditions,
it includes the monitored metric's detected value and a description of the threshold.
- `context.timestamp`: The timestamp of when the alert was detected.
- `context.reason`: This variable describes why the rule is in its current state. For each of the conditions,
it includes the detected value of the monitored metric and a description of the threshold.
- `context.timestamp`: The timestamp of when the rule was detected.


[float]
=== Advanced variables

[role="screenshot"]
image::images/advanced-variables.png[The default infrastructure threshold alert message detailing advanced variables]
image::images/advanced-variables.png[The default infrastructure threshold rule message detailing advanced variables]

Instead of using `context.reason` to provide all the information you need, there may be cases when you'd like to
customize your action message. Infrastructure threshold alerts provide advanced context variables that have a tree structure.
customize your action message. Infrastructure threshold rules provide advanced context variables that have a tree structure.

[IMPORTANT]
==============================================
These variables must use the structure of `{{context.[Variable Name].condition[Number]}}`. For example,
`{{context.value.condition0}}`, `{{context.value.condition1}}`, and so on. This is required even if your alert has only
`{{context.value.condition0}}`, `{{context.value.condition1}}`, and so on. This is required even if your rule has only
one condition (accessible with `.condition0`). Using just `{{context.[Variable Name]}}` evaluates to a blank line or
`[object Object]` depending on the action type.
==============================================
Expand All @@ -101,9 +101,9 @@ one condition (accessible with `.condition0`). Using just `{{context.[Variable N
[[infra-alert-settings]]
== Settings

With infrastructure threshold alerts, it's not possible to set an explicit index pattern as part of the configuration. The index pattern
With infrastructure threshold rules, it's not possible to set an explicit index pattern as part of the configuration. The index pattern
is instead inferred from *Metrics indices* on the <<configure-settings,Settings>> page of the {metrics-app}.

With each alert check's execution, the *Metrics indices* setting is checked, but it is not stored when the alert is created.
With each execution of the rule check, the *Metrics indices* setting is checked, but it is not stored when the rule is created.

The *Timestamp* field that is set under *Settings* determines which field is used for timestamps in queries.
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ Create a machine learning job to detect anomalous monitor duration rates automat
[NOTE]
=====
If anomaly detection is already enabled, click *Anomaly detection* and select to view duration anomalies directly in the
{ml-docs}/ml-gs-results.html[Machine Learning app], enable an <<duration-anomaly-alert,anomaly alert>>,
{ml-docs}/ml-gs-results.html[Machine Learning app], enable an <<duration-anomaly-alert,anomaly rule>>,
or disable the anomaly detection.
=====
+
3. You are prompted to create a <<duration-anomaly-alert,response duration anomaly alert>> for the machine learning job which will carry
out the analysis, and you can configure which severity level to create the alert for.
3. You are prompted to create a <<duration-anomaly-alert,response duration anomaly rule>> for the machine learning job which will carry
out the analysis, and you can configure which severity level to create the rule for.

When an anomaly is detected, the duration is displayed on the *Monitor duration*
chart, along with the duration times. The colors represent the criticality of the anomaly: red
Expand Down
Loading