-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLOs] Subtract out stale SLO count from healthy slos in overview !! #198911
Labels
bug
Fixes for quality problems that affect the customer experience
Team:obs-ux-management
Observability Management User Experience Team
Comments
shahzad31
added
bug
Fixes for quality problems that affect the customer experience
Team:obs-ux-management
Observability Management User Experience Team
labels
Nov 5, 2024
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
@justinkambic good issue for you to pick to get some onboarding to SLO |
Thank you for the suggestion, I will probably go for this one then. I am trying to get a Synthetics fix reviewable and I'll make this my next target. |
justinkambic
added a commit
that referenced
this issue
Nov 26, 2024
## Summary Resolves #198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ```
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Nov 26, 2024
## Summary Resolves elastic#198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ``` (cherry picked from commit a92103b)
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Nov 26, 2024
## Summary Resolves elastic#198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ``` (cherry picked from commit a92103b)
kibanamachine
added a commit
that referenced
this issue
Nov 26, 2024
…) (#201830) # Backport This will backport the following commits from `main` to `8.17`: - [[SLO] Exclude stale slos from healthy count on overview (#201027)](#201027) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Justin Kambic","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T16:23:20Z","message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.17.0"],"title":"[SLO] Exclude stale slos from healthy count on overview","number":201027,"url":"https://github.com/elastic/kibana/pull/201027","mergeCommit":{"message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},"sourceBranch":"main","suggestedTargetBranches":["8.17"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201027","number":201027,"mergeCommit":{"message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Justin Kambic <[email protected]>
kibanamachine
added a commit
that referenced
this issue
Nov 26, 2024
… (#201831) # Backport This will backport the following commits from `main` to `8.x`: - [[SLO] Exclude stale slos from healthy count on overview (#201027)](#201027) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Justin Kambic","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T16:23:20Z","message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.17.0"],"title":"[SLO] Exclude stale slos from healthy count on overview","number":201027,"url":"https://github.com/elastic/kibana/pull/201027","mergeCommit":{"message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},"sourceBranch":"main","suggestedTargetBranches":["8.17"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201027","number":201027,"mergeCommit":{"message":"[SLO] Exclude stale slos from healthy count on overview (#201027)\n\n## Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove any stale SLOs from the ultimate result.\r\n\r\nThis required a modification of the parsing code on the ES response to\r\ninclude a new `not_stale` key. The original `success` total is preserved\r\nin the `doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter for the `not_stale` agg I have added is the logical inverse\r\nof the filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling of a UI component, should be\r\na completely transparent change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local running on `main`:\r\n\r\n<img width=\"1116\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n### After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img width=\"1120\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n### Proof query works\r\n\r\nYou can replicate these results by including a similar agg on a query\r\nagainst SLO data. I added a terms agg to the `stale` agg to determine\r\nhow many SLOs I need to remove. The number of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n \"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n }\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n \"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\": {\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my example query output. You can see that\r\n`stale.by_status.buckets[1]` contains a total of 2 docs, which is the\r\ndifference between `healthy.doc_count` and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\": {\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n \"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n \"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n },\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n }\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\": {\r\n \"doc_count\": 7\r\n }\r\n }\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Justin Kambic <[email protected]>
paulinashakirova
pushed a commit
to paulinashakirova/kibana
that referenced
this issue
Nov 26, 2024
## Summary Resolves elastic#198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ```
CAWilson94
pushed a commit
to CAWilson94/kibana
that referenced
this issue
Dec 12, 2024
## Summary Resolves elastic#198911. The result is achieved by nesting a new filter agg inside the existing `HEALTHY` agg to remove any stale SLOs from the ultimate result. This required a modification of the parsing code on the ES response to include a new `not_stale` key. The original `success` total is preserved in the `doc_count` of that agg, but is no longer referenced. The filter for the `not_stale` agg I have added is the logical inverse of the filter we're using to determine stale SLOs: ```json { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } ``` _Reviewer note: I also changed the spelling of a UI component, should be a completely transparent change._ ## Example ### Before This is my local running on `main`: <img width="1116" alt="image" src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225"> ### After This is my local running on this PR branch: <img width="1120" alt="image" src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2"> ### Proof query works You can replicate these results by including a similar agg on a query against SLO data. I added a terms agg to the `stale` agg to determine how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up in `stale` should match the difference between the total `doc_count` from `healthy` and the `doc_count` in the `not_stale` sub-aggregation. #### Query You can run this example aggs: ```json { "aggs": { "stale": { "filter": { "range": { "summaryUpdatedAt": { "lt": "now-48h" } } }, "aggs": { "by_status": { "terms": { "field": "status" } } } }, "healthy": { "filter": { "term": { "status": "HEALTHY" } }, "aggs": { "not_stale": { "filter": { "range": { "summaryUpdatedAt": { "gte": "now-48h" } } } } } } } } ``` #### Relevant output Here's a subset of my example query output. You can see that `stale.by_status.buckets[1]` contains a total of 2 docs, which is the difference between `healthy.doc_count` and `healthy.not_stale.doc_count`. ```json { "stale": { "doc_count": 7, "by_status": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "VIOLATED", "doc_count": 5 }, { "key": "HEALTHY", "doc_count": 2 } ] } }, "healthy": { "doc_count": 9, "not_stale": { "doc_count": 7 } } } ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Fixes for quality problems that affect the customer experience
Team:obs-ux-management
Observability Management User Experience Team
In Overview panel we should count of Healthy etc count. But it seems like count of stale SLOs isn't subtract from healthy or breached slos.
we should fix that
The text was updated successfully, but these errors were encountered: