-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLO] Exclude stale slos from healthy count on overview #201027
[SLO] Exclude stale slos from healthy count on overview #201027
Conversation
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
@@ -133,7 +144,7 @@ export class GetSLOsOverview { | |||
return { | |||
violated: aggs?.violated.doc_count ?? 0, | |||
degrading: aggs?.degrading.doc_count ?? 0, | |||
healthy: aggs?.healthy.doc_count ?? 0, | |||
healthy: aggs?.healthy?.not_stale?.doc_count ?? 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that i am thinking, i think same should be subtracted from degrading and violated SLOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this offline and agreed we can apply this filtering further up rather than adding sub-aggregations for all of the non-stale filters. I'll ping when this is done.
1bfa7e7
to
1186417
Compare
@@ -9,7 +9,7 @@ import { EuiFlexItem, EuiStat, EuiToolTip } from '@elastic/eui'; | |||
import React from 'react'; | |||
import { useUrlSearchState } from '../../hooks/use_url_search_state'; | |||
|
|||
export function OverViewItem({ | |||
export function OverviewItem({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep using the settings, and also if we can double check the usage of worst
. I have the feeling it is not used and could be 🔪
Otherwise, looks good to me.
x-pack/plugins/observability_solution/slo/server/services/get_slos_overview.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/observability_solution/slo/server/services/get_slos_overview.ts
Outdated
Show resolved
Hide resolved
060ebcc
to
566e2ac
Compare
566e2ac
to
fcd65bd
Compare
x-pack/plugins/observability_solution/slo/server/services/get_slos_overview.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one thing to cleanup but otherwise 👍🏻
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
History
|
Starting backport for target branches: 8.17, 8.x |
## Summary Resolves elastic#198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ``` (cherry picked from commit a92103b)
## Summary Resolves elastic#198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ``` (cherry picked from commit a92103b)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…) (#201830) # Backport This will backport the following commits from `main` to `8.17`: - [[SLO] Exclude stale slos from healthy count on overview (#201027)](#201027) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Justin Kambic","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T16:23:20Z","message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.17.0"],"title":"[SLO] Exclude stale slos from healthy count on overview","number":201027,"url":"https://github.com/elastic/kibana/pull/201027","mergeCommit":{"message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},"sourceBranch":"main","suggestedTargetBranches":["8.17"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201027","number":201027,"mergeCommit":{"message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Justin Kambic <[email protected]>
… (#201831) # Backport This will backport the following commits from `main` to `8.x`: - [[SLO] Exclude stale slos from healthy count on overview (#201027)](#201027) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Justin Kambic","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T16:23:20Z","message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.17.0"],"title":"[SLO] Exclude stale slos from healthy count on overview","number":201027,"url":"https://github.com/elastic/kibana/pull/201027","mergeCommit":{"message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},"sourceBranch":"main","suggestedTargetBranches":["8.17"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201027","number":201027,"mergeCommit":{"message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Justin Kambic <[email protected]>
## Summary Resolves elastic#198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ```
## Summary Resolves elastic#198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ```
Summary
Resolves #198911.
The result is achieved by nesting a new filter agg inside the existing
HEALTHY
agg to remove any stale SLOs from the ultimate result.This required a modification of the parsing code on the ES response to include a new
not_stale
key. The originalsuccess
total is preserved in thedoc_count
of that agg, but is no longer referenced.The filter for the
not_stale
agg I have added is the logical inverse of the filter we're using to determine stale SLOs:Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change.
Example
Before
This is my local running on
main
:After
This is my local running on this PR branch:
Proof query works
You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the
stale
agg to determine how many SLOs I need to remove. The number ofHEALTHY
SLOs showing up instale
should match the difference between the totaldoc_count
fromhealthy
and thedoc_count
in thenot_stale
sub-aggregation.Query
You can run this example aggs:
Relevant output
Here's a subset of my example query output. You can see that
stale.by_status.buckets[1]
contains a total of 2 docs, which is the difference betweenhealthy.doc_count
andhealthy.not_stale.doc_count
.