Skip to content

Commit

Permalink
[8.x] [SLO] Exclude stale slos from healthy count on overview (elasti…
Browse files Browse the repository at this point in the history
…c#201027) (elastic#201831)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[SLO] Exclude stale slos from healthy count on overview
(elastic#201027)](elastic#201027)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Justin
Kambic","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T16:23:20Z","message":"[SLO]
Exclude stale slos from healthy count on overview (elastic#201027)\n\n##
Summary\r\n\r\nResolves elastic#198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.17.0"],"title":"[SLO]
Exclude stale slos from healthy count on
overview","number":201027,"url":"https://github.com/elastic/kibana/pull/201027","mergeCommit":{"message":"[SLO]
Exclude stale slos from healthy count on overview (elastic#201027)\n\n##
Summary\r\n\r\nResolves elastic#198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},"sourceBranch":"main","suggestedTargetBranches":["8.17"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201027","number":201027,"mergeCommit":{"message":"[SLO]
Exclude stale slos from healthy count on overview (elastic#201027)\n\n##
Summary\r\n\r\nResolves elastic#198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Justin Kambic <[email protected]>
  • Loading branch information
kibanamachine and justinkambic authored Nov 26, 2024
1 parent 55433a4 commit 279795f
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 56 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,6 @@ const getOverviewResponseSchema = t.type({
degrading: t.number,
stale: t.number,
healthy: t.number,
worst: t.type({
value: t.number,
id: t.string,
}),
noData: t.number,
burnRateRules: t.number,
burnRateActiveAlerts: t.number,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import { EuiFlexItem, EuiStat, EuiToolTip } from '@elastic/eui';
import React from 'react';
import { useUrlSearchState } from '../../hooks/use_url_search_state';

export function OverViewItem({
export function OverviewItem({
title,
description,
titleColor,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import { GetOverviewResponse } from '@kbn/slo-schema/src/rest_specs/routes/get_o
import { rulesLocatorID, RulesParams } from '@kbn/observability-plugin/public';
import { useAlertsUrl } from '../../../../hooks/use_alerts_url';
import { useKibana } from '../../../../hooks/use_kibana';
import { OverViewItem } from './overview_item';
import { OverviewItem } from './overview_item';

export function SLOOverviewAlerts({
data,
Expand Down Expand Up @@ -55,7 +55,7 @@ export function SLOOverviewAlerts({

<EuiSpacer size="xs" />
<EuiFlexGroup justifyContent="spaceBetween">
<OverViewItem
<OverviewItem
title={data?.burnRateActiveAlerts}
description={i18n.translate('xpack.slo.sLOsOverview.euiStat.burnRateActiveAlerts', {
defaultMessage: 'Active alerts',
Expand All @@ -66,7 +66,7 @@ export function SLOOverviewAlerts({
application.navigateToUrl(getAlertsUrl('active'));
}}
/>
<OverViewItem
<OverviewItem
title={data?.burnRateRecoveredAlerts}
description={i18n.translate('xpack.slo.sLOsOverview.euiStat.burnRateRecoveredAlerts', {
defaultMessage: 'Recovered alerts',
Expand All @@ -77,7 +77,7 @@ export function SLOOverviewAlerts({
application.navigateToUrl(getAlertsUrl('recovered'));
}}
/>
<OverViewItem
<OverviewItem
title={data?.burnRateRules}
description={i18n.translate('xpack.slo.sLOsOverview.euiStat.burnRateRules', {
defaultMessage: 'Rules',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import { SLOOverviewAlerts } from './slo_overview_alerts';
import { useGetSettings } from '../../../slo_settings/hooks/use_get_settings';
import { useFetchSLOsOverview } from '../../hooks/use_fetch_slos_overview';
import { useUrlSearchState } from '../../hooks/use_url_search_state';
import { OverViewItem } from './overview_item';
import { OverviewItem } from './overview_item';

export function SLOsOverview() {
const { state } = useUrlSearchState();
Expand Down Expand Up @@ -50,7 +50,7 @@ export function SLOsOverview() {
</EuiTitle>
<EuiSpacer size="xs" />
<EuiFlexGroup gutterSize="xl" justifyContent="spaceBetween">
<OverViewItem
<OverviewItem
title={data?.healthy}
description={i18n.translate('xpack.slo.sLOsOverview.euiStat.healthyLabel', {
defaultMessage: 'Healthy',
Expand All @@ -62,7 +62,7 @@ export function SLOsOverview() {
defaultMessage: 'Click to filter SLOs by Healthy status.',
})}
/>
<OverViewItem
<OverviewItem
title={data?.violated}
description={i18n.translate('xpack.slo.sLOsOverview.euiStat.violatedLabel', {
defaultMessage: 'Violated',
Expand All @@ -74,7 +74,7 @@ export function SLOsOverview() {
defaultMessage: 'Click to filter SLOs by Violated status.',
})}
/>
<OverViewItem
<OverviewItem
title={data?.noData}
description={i18n.translate('xpack.slo.sLOsOverview.euiStat.noDataLabel', {
defaultMessage: 'No data',
Expand All @@ -86,7 +86,7 @@ export function SLOsOverview() {
defaultMessage: 'Click to filter SLOs by no data status.',
})}
/>
<OverViewItem
<OverviewItem
title={data?.degrading}
description={i18n.translate('xpack.slo.sLOsOverview.euiStat.degradingLabel', {
defaultMessage: 'Degrading',
Expand All @@ -98,7 +98,7 @@ export function SLOsOverview() {
})}
titleColor={theme.colors.warningText}
/>
<OverViewItem
<OverviewItem
title={data?.stale}
description={i18n.translate('xpack.slo.sLOsOverview.euiStat.staleLabel', {
defaultMessage: 'Stale',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,19 +53,6 @@ export class GetSLOsOverview {
},
body: {
aggs: {
worst: {
top_hits: {
sort: {
errorBudgetRemaining: {
order: 'asc',
},
},
_source: {
includes: ['sliValue', 'status', 'slo.id', 'slo.instanceId', 'slo.name'],
},
size: 1,
},
},
stale: {
filter: {
range: {
Expand All @@ -75,31 +62,42 @@ export class GetSLOsOverview {
},
},
},
violated: {
not_stale: {
filter: {
term: {
status: 'VIOLATED',
range: {
summaryUpdatedAt: {
gte: `now-${settings.staleThresholdInHours}h`,
},
},
},
},
healthy: {
filter: {
term: {
status: 'HEALTHY',
aggs: {
violated: {
filter: {
term: {
status: 'VIOLATED',
},
},
},
},
},
degrading: {
filter: {
term: {
status: 'DEGRADING',
healthy: {
filter: {
term: {
status: 'HEALTHY',
},
},
},
},
},
noData: {
filter: {
term: {
status: 'NO_DATA',
degrading: {
filter: {
term: {
status: 'DEGRADING',
},
},
},
noData: {
filter: {
term: {
status: 'NO_DATA',
},
},
},
},
},
Expand Down Expand Up @@ -131,15 +129,11 @@ export class GetSLOsOverview {
const aggs = response.aggregations;

return {
violated: aggs?.violated.doc_count ?? 0,
degrading: aggs?.degrading.doc_count ?? 0,
healthy: aggs?.healthy.doc_count ?? 0,
noData: aggs?.noData.doc_count ?? 0,
violated: aggs?.not_stale?.violated.doc_count ?? 0,
degrading: aggs?.not_stale?.degrading.doc_count ?? 0,
healthy: aggs?.not_stale?.healthy?.doc_count ?? 0,
noData: aggs?.not_stale?.noData.doc_count ?? 0,
stale: aggs?.stale.doc_count ?? 0,
worst: {
value: 0,
id: 'id',
},
burnRateRules: rules.total,
burnRateActiveAlerts: alerts.activeAlertCount,
burnRateRecoveredAlerts: alerts.recoveredAlertCount,
Expand Down

0 comments on commit 279795f

Please sign in to comment.