Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Update Observability docs to fix problems found during testing #175636

Merged
merged 4 commits into from
Jan 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/apm/advanced-queries.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ and *Discover* supports all of the example APM app queries shown on this page.
[[discover-queries]]
==== Discover queries

One example where you may want to make use of *Discover*,
One example where you may want to make use of *Discover*
is to view _all_ transactions for an endpoint instead of just a sample.

TIP: Starting in v7.6, you can view ten samples per bucket in the APM app, instead of just one.
Expand All @@ -77,7 +77,7 @@ that took between 13 and 14 milliseconds. Here's what Discover returns:
image::apm/images/advanced-discover.png[View all transactions in bucket]

You can now explore the data until you find a specific transaction that you're interested in.
Copy that transaction's `transaction.id`, and paste it into the APM app to view the data in the context of the APM app:
Copy that transaction's `transaction.id` and paste it into the APM app to view the data in the context of the APM app:

[role="screenshot"]
image::apm/images/specific-transaction-search.png[View specific transaction in apm app]
Expand Down
4 changes: 2 additions & 2 deletions docs/apm/filters.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
++++

Global filters are ways you can filter data across the APM app based on a specific
time range or environment. They are available in the Services, Transactions, Errors,
Metrics, and Traces views, and any filter applied will persist as you move between pages.
time range or environment. When viewing a specific service, the filter persists
as you move between tabs.

[role="screenshot"]
image::apm/images/global-filters.png[Global filters available in the APM app in Kibana]
Expand Down
4 changes: 2 additions & 2 deletions docs/apm/infrastructure.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@

beta::[]

The *Infrastructure* tab provides information about the containers, pods, and hosts,
The *Infrastructure* tab provides information about the containers, pods, and hosts
that the selected service is linked to.

[role="screenshot"]
image::apm/images/infra.png[Example view of the Infrastructure tab in APM app in Kibana]

IT ops and software reliability engineers (SREs) can use this tab
to quickly find a service's underlying infrastructure resources when debugging a problem.
Knowing what infrastructure is related to a service allows you to remediate issues by restarting, killing hanging instances, changing configuration, rolling back deployments, scaling up, scaling out, etc.
Knowing what infrastructure is related to a service allows you to remediate issues by restarting, killing hanging instances, changing configuration, rolling back deployments, scaling up, scaling out, and so on.
8 changes: 4 additions & 4 deletions docs/apm/machine-learning.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The Machine learning integration initiates a new job predefined to calculate ano
With this integration, you can quickly pinpoint anomalous transactions and see the health of
any upstream and downstream services.

Machine learning jobs are created per environment, and are based on a service's average response time.
Machine learning jobs are created per environment and are based on a service's average response time.
Because jobs are created at the environment level,
you can add new services to your existing environments without the need for additional machine learning jobs.

Expand Down Expand Up @@ -40,7 +40,7 @@ To enable machine learning anomaly detection:
. From the Services overview, Traces overview, or Service Map tab,
select **Anomaly detection**.

. Click **Create ML Job**.
. Click **Create Job**.

. Machine learning jobs are created at the environment level.
Select all of the service environments that you want to enable anomaly detection in.
Expand All @@ -50,7 +50,7 @@ Anomalies will surface for all services and transaction types within the selecte

That's it! After a few minutes, the job will begin calculating results;
it might take additional time for results to appear on your service maps.
Existing jobs can be managed in *Machine Learning jobs management*.
To manage existing jobs, click **Manage jobs**.

[float]
[[warning-ml-integration]]
Expand All @@ -66,7 +66,7 @@ image::apm/images/apm-anomaly-alert.png[Example view of anomaly alert in the APM
[[unkown-ml-integration]]
=== Unknown service health

After enabling anomaly detection, service health may display as "Unknown". There are three reasons why this can occur:
After enabling anomaly detection, service health may display as "Unknown". Here are some reasons why this can occur:

1. No machine learning job exists. See <<create-ml-integration>> to enable anomaly detection and create a machine learning job.
2. There is no machine learning data for the job. If you just created the machine learning job you'll need to wait a few minutes for data to be available. Alternatively, if the service or its enviroment are new, you'll need to wait for more trace data.
Expand Down
9 changes: 4 additions & 5 deletions docs/apm/service-maps.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,8 @@ This can be useful if you have two or more services, in separate environments, b
Use the environment drop-down to only see the data you're interested in, like `dev` or `production`.

If there's a specific service that interests you, select that service to highlight its connections.
Clicking **Focus map** will refocus the map on that specific service and lock the connection highlighting.
From here, select **Service Details**, or click on the **Transaction** tab to jump to the Transaction overview
for the selected service.
Click **Focus map** to refocus the map on the selected service and lock the connection highlighting.
From here, select **Service Details**, or click the **Transactions** tab to jump to the Transaction overview for the selected service.
You can also use the tabs at the top of the page to easily jump to the **Errors** or **Metrics** overview.

[role="screenshot"]
Expand All @@ -74,7 +73,7 @@ image::apm/images/service-maps-java.png[Example view of service maps in the APM
[[service-map-anomaly-detection]]
=== Anomaly detection with machine learning

Machine learning jobs can be created to calculate anomaly scores on APM transaction durations within the selected service.
You can create machine learning jobs to calculate anomaly scores on APM transaction durations within the selected service.
When these jobs are active, service maps will display a color-coded anomaly indicator based on the detected anomaly score:

[horizontal]
Expand All @@ -85,7 +84,7 @@ image:apm/images/red-service.png[APM red service]:: Max anomaly score **≥75**.
[role="screenshot"]
image::apm/images/apm-service-map-anomaly.png[Example view of anomaly scores on service maps in the APM app]

If an anomaly has been detected, click *view anomalies* to view the anomaly detection metric viewer in the Machine learning app.
If an anomaly has been detected, click *View anomalies* to view the anomaly detection metric viewer in the Machine learning app.
This time series analysis will display additional details on the severity and time of the detected anomalies.

To learn how to create a machine learning job, see <<machine-learning-integration,machine learning integration>>.
Expand Down
12 changes: 8 additions & 4 deletions docs/apm/services.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,14 @@ image::apm/images/apm-service-group.png[Example view of service group in the APM
To enable Service groups, open {kib} and navigate to **Stack Management** > **Advanced Settings** > **Observability**,
and enable the **Service groups feature**.

To create a service group, navigate to **Observability** > **APM** > **Services** and select **Create group**.
Specify a name, color, and description.
Then, using the <<kuery-query, Kibana Query Language (KQL)>>, specify a query to select services for the group.
To create a service group:

. Navigate to **Observability** > **APM** > **Services**.
. Switch to **Service groups**.
. Click **Create group**.
. Specify a name, color, and description.
. Click **Select services**.
. Specify a <<kuery-query, Kibana Query Language (KQL)>> query to select services for the group.
Services that match the query within the last 24 hours will be assigned to the group.

[NOTE]
Expand All @@ -54,4 +59,3 @@ Not sure where to get started? Here are some sample queries you can build from:

* Group services by environment--in this example, "production": `service.environment : "production"`
* Group services by name--this example groups those that end in "beat": `service.name : *beat` (matches services named "Auditbeat", "Heartbeat", "Filebeat", etc.)
* Group services with a high transaction duration in the last 24 hours: `transaction.duration.us >= 50000000`
10 changes: 5 additions & 5 deletions docs/apm/transactions.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ APM agents automatically collect performance metrics on HTTP requests, database
[role="screenshot"]
image::apm/images/apm-transactions-overview.png[Example view of transactions table in the APM app in Kibana]

The *Latency*, *Throughput*, *Failed transaction rate*, *Average duration by span type*, and *Cold start rate*
The *Latency*, *Throughput*, *Failed transaction rate*, *Time spent by span type*, and *Cold start rate*
charts display information on all transactions associated with the selected service:

*Latency*::
Expand Down Expand Up @@ -38,7 +38,7 @@ These spans will set `event.outcome=failure` and increase the failed transaction
If there is no HTTP status, both transactions and spans are considered successful unless an error is reported.
====

*Average duration by span type*::
*Time spent by span type*::
Visualize where your application is spending most of its time.
For example, is your app spending time in external calls, database processing, or application code execution?
+
Expand Down Expand Up @@ -106,10 +106,10 @@ image::apm/images/apm-transactions-overview.png[Example view of response time di
[[transaction-duration-distribution]]
==== Latency distribution

A plot of all transaction durations for the given time period.
The screenshot below shows a typical distribution,
The latency distribution shows a plot of all transaction durations for the given time period.
The following screenshot shows a typical distribution
and indicates most of our requests were served quickly -- awesome!
It's the requests on the right, the ones taking longer than average, that we probably need to focus on.
The requests on the right are taking longer than average; we probably need to focus on them.

[role="screenshot"]
image::apm/images/apm-transaction-duration-dist.png[Example view of latency distribution graph]
Expand Down