Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding SLOs troubleshooting document #4419

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/en/observability/apm/known-issues.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,17 @@ _Versions: XX.XX.XX, YY.YY.YY, ZZ.ZZ.ZZ_
[discrete]
== Upgrading to v8.15.x may cause ingestion to fail

<<<<<<< HEAD
_Elastic Stack versions: 8.15.0_ +

// The conditions in which this issue occurs
The issue only occurs when _upgrading_ the {stack} from <= 8.12.2 directly to any 8.15.x version.
=======
_Elastic Stack versions: 8.15.0+_

// The conditions in which this issue occurs
The issue only occurs when _upgrading_ the {stack} from 8.12.2 or lower directly to any 8.15.x version.
>>>>>>> origin/main
eedugon marked this conversation as resolved.
Show resolved Hide resolved
The issue does _not_ occur when creating a _new_ cluster using any 8.15.x version, or when upgrading
from 8.12.2 to 8.13.x or 8.14.x and then to 8.15.x.

Expand All @@ -42,7 +49,11 @@ related to https://github.com/elastic/elasticsearch/issues/112781[lazy rollover
If the deployment is running 8.15.0, upgrade the deployment to 8.15.1 or above.
A manual rollover of all APM data streams is required to pick up the new index templates and remove the faulty ingest pipeline version check.
Perform the following requests to Elasticsearch (they are assuming the `default` namespace is used, adjust if necessary):
<<<<<<< HEAD
+
=======

>>>>>>> origin/main
[source,txt]
----
POST /traces-apm-default/_rollover
Expand Down
2 changes: 2 additions & 0 deletions docs/en/observability/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,8 @@ include::slo-privileges.asciidoc[leveloffset=+3]

include::slo-create.asciidoc[leveloffset=+3]

include::slo-troubleshoot.asciidoc[leveloffset=+3]

//Data Set Quality
include::logs-monitor-datasets.asciidoc[leveloffset=+1]

Expand Down
6 changes: 6 additions & 0 deletions docs/en/observability/slo-create.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ From here, complete the following steps:
. <<set-slo>>.
. <<slo-describe>>.

[NOTE]
====
For SLOs to function, the cluster must include one or more nodes with both `ingest` and `transform` {ref}/modules-node.html#node-roles[roles] (they can co-exist or be distributed across separate nodes).
On ESS deployments (Elastic Cloud), this is handled by the hot nodes, which serve as both `ingest` and `transform` nodes.
====

[discrete]
[[define-sli]]
= Define your SLI
Expand Down
60 changes: 4 additions & 56 deletions docs/en/observability/slo-overview.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
// tag::slo-license[]
[IMPORTANT]
====
To create and manage SLOs, you need an {subscriptions}[appropriate license] and <<slo-privileges,SLO access>> must be configured.
To create and manage SLOs, you need an {subscriptions}[appropriate license], an {es} cluster with both `transform` and `ingest` {ref}/modules-node.html#node-roles[node roles] present, and <<slo-privileges,SLO access>> must be configured.
====
// end::slo-license[]

Expand All @@ -29,6 +29,8 @@ SLO:: The target you set for your SLI. It specifies th
Error budget:: The amount of time that your SLI can not meet the SLO target before it violates your SLO.
Burn rate:: The rate at which your service consumes your error budget.

In addition to these key concepts related to SLO functionality, see <<slo-understanding-slos>> for more information on how SLOs work and their relationship with other system components, such as {ref}/transforms.html[{es} Transforms].

[discrete]
[[slo-in-elastic]]
== SLO overview
Expand Down Expand Up @@ -94,61 +96,7 @@ Starting in version 8.12.0, SLOs are generally available (GA).
If you're upgrading from a beta version of SLOs (available in 8.11.0 and earlier),
you must migrate your SLO definitions to a new format.

[%collapsible]
.Migrate your SLO definitions
====
To migrate your SLO definitions, open the SLO overview.
A banner will display the number of outdated SLOs detected.
For each outdated SLO, click **Reset**. If you no longer need the SLO, select **Delete**.

If you have a large number of SLO definitions, it is possible to automate this process.
To do this, you'll need to use two Elastic APIs:

* https://github.com/elastic/kibana/blob/9cb830fe9a021cda1d091effbe3e0cd300220969/x-pack/plugins/observability/docs/openapi/slo/bundled.yaml#L453-L514[SLO Definitions Find API] (`/api/observability/slos/_definitions`)
* https://github.com/elastic/kibana/blob/9cb830fe9a021cda1d091effbe3e0cd300220969/x-pack/plugins/observability/docs/openapi/slo/bundled.yaml#L368-L410[SLO Reset API] (`/api/observability/slos/${id}/_reset`)

Pass in `includeOutdatedOnly=1` as a query parameter to the Definitions Find API.
This will display your outdated SLO definitions.
Loop through this list, one by one, calling the Reset API on each outdated SLO definition.
The Reset API loads the outdated SLO definition and resets it to the new format required for GA.
Once an SLO is reset, it will start to regenerate SLIs and summary data.
====

[%collapsible]
.Remove legacy summary transforms
====
After migrating to 8.12 or later, you might have some legacy SLO summary transforms running.
You can safely delete the following legacy summary transforms:

[source,sh]
----------------------------------
# Stop all legacy summary transforms
POST _transform/slo-summary-occurrences-30d-rolling/_stop?force=true
POST _transform/slo-summary-occurrences-7d-rolling/_stop?force=true
POST _transform/slo-summary-occurrences-90d-rolling/_stop?force=true
POST _transform/slo-summary-occurrences-monthly-aligned/_stop?force=true
POST _transform/slo-summary-occurrences-weekly-aligned/_stop?force=true
POST _transform/slo-summary-timeslices-30d-rolling/_stop?force=true
POST _transform/slo-summary-timeslices-7d-rolling/_stop?force=true
POST _transform/slo-summary-timeslices-90d-rolling/_stop?force=true
POST _transform/slo-summary-timeslices-monthly-aligned/_stop?force=true
POST _transform/slo-summary-timeslices-weekly-aligned/_stop?force=true

# Delete all legacy summary transforms
DELETE _transform/slo-summary-occurrences-30d-rolling?force=true
DELETE _transform/slo-summary-occurrences-7d-rolling?force=true
DELETE _transform/slo-summary-occurrences-90d-rolling?force=true
DELETE _transform/slo-summary-occurrences-monthly-aligned?force=true
DELETE _transform/slo-summary-occurrences-weekly-aligned?force=true
DELETE _transform/slo-summary-timeslices-30d-rolling?force=true
DELETE _transform/slo-summary-timeslices-7d-rolling?force=true
DELETE _transform/slo-summary-timeslices-90d-rolling?force=true
DELETE _transform/slo-summary-timeslices-monthly-aligned?force=true
DELETE _transform/slo-summary-timeslices-weekly-aligned?force=true
----------------------------------

Do not delete any new summary transforms used by your migrated SLOs.
====
Refer to <<slo-troubleshoot-beta>> for more details on how to proceed.

[discrete]
[[slo-overview-next-steps]]
Expand Down
2 changes: 1 addition & 1 deletion docs/en/observability/slo-privileges.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<titleabbrev>Configure SLO access</titleabbrev>
++++

IMPORTANT: To create and manage SLOs, you need an {subscriptions}[appropriate license].
IMPORTANT: To create and manage SLOs, you need an {subscriptions}[appropriate license] and an {es} cluster with both `transform` and `ingest` {ref}/modules-node.html#node-roles[node roles] present.

You can enable access to SLOs in two different ways:

Expand Down
Loading