Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLOs] Subtract out stale SLO count from healthy slos in overview !! #198911

Closed
shahzad31 opened this issue Nov 5, 2024 · 3 comments · Fixed by #201027
Closed

[SLOs] Subtract out stale SLO count from healthy slos in overview !! #198911

shahzad31 opened this issue Nov 5, 2024 · 3 comments · Fixed by #201027
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Team:obs-ux-management Observability Management User Experience Team

Comments

@shahzad31
Copy link
Contributor

In Overview panel we should count of Healthy etc count. But it seems like count of stale SLOs isn't subtract from healthy or breached slos.

we should fix that

Image

@shahzad31 shahzad31 added bug Fixes for quality problems that affect the customer experience Team:obs-ux-management Observability Management User Experience Team labels Nov 5, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@shahzad31
Copy link
Contributor Author

@justinkambic good issue for you to pick to get some onboarding to SLO

@justinkambic
Copy link
Contributor

Thank you for the suggestion, I will probably go for this one then. I am trying to get a Synthetics fix reviewable and I'll make this my next target.

@justinkambic justinkambic self-assigned this Nov 7, 2024
justinkambic added a commit that referenced this issue Nov 26, 2024
## Summary

Resolves #198911.

The result is achieved by nesting a new filter agg inside the existing
`HEALTHY` agg to remove any stale SLOs from the ultimate result.

This required a modification of the parsing code on the ES response to
include a new `not_stale` key. The original `success` total is preserved
in the `doc_count` of that agg, but is no longer referenced.

The filter for the `not_stale` agg I have added is the logical inverse
of the filter we're using to determine stale SLOs:

```json
{
  "range": {
    "summaryUpdatedAt": {
      "gte": "now-48h"
    }
  }
}
```

_Reviewer note: I also changed the spelling of a UI component, should be
a completely transparent change._

## Example

### Before

This is my local running on `main`:

<img width="1116" alt="image"
src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225">


### After

This is my local running on this PR branch:

<img width="1120" alt="image"
src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2">


### Proof query works

You can replicate these results by including a similar agg on a query
against SLO data. I added a terms agg to the `stale` agg to determine
how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up
in `stale` should match the difference between the total `doc_count`
from `healthy` and the `doc_count` in the `not_stale` sub-aggregation.

#### Query

You can run this example aggs:

```json
{
  "aggs": {
    "stale": {
      "filter": {
        "range": {
          "summaryUpdatedAt": {
            "lt": "now-48h"
          }
        }
      },
      "aggs": {
        "by_status": {
          "terms": {
            "field": "status"
          }
        }
      }
    },
    "healthy": {
      "filter": {
        "term": {
          "status": "HEALTHY"
        }
      },
      "aggs": {
        "not_stale": {
          "filter": {
            "range": {
              "summaryUpdatedAt": {
                "gte": "now-48h"
              }
            }
          }
        }
      }
    }
  }
}
```

#### Relevant output

Here's a subset of my example query output. You can see that
`stale.by_status.buckets[1]` contains a total of 2 docs, which is the
difference between `healthy.doc_count` and
`healthy.not_stale.doc_count`.

```json
{
  "stale": {
    "doc_count": 7,
    "by_status": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "VIOLATED",
          "doc_count": 5
        },
        {
          "key": "HEALTHY",
          "doc_count": 2
        }
      ]
    }
  },
  "healthy": {
    "doc_count": 9,
    "not_stale": {
      "doc_count": 7
    }
  }
}
```
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 26, 2024
## Summary

Resolves elastic#198911.

The result is achieved by nesting a new filter agg inside the existing
`HEALTHY` agg to remove any stale SLOs from the ultimate result.

This required a modification of the parsing code on the ES response to
include a new `not_stale` key. The original `success` total is preserved
in the `doc_count` of that agg, but is no longer referenced.

The filter for the `not_stale` agg I have added is the logical inverse
of the filter we're using to determine stale SLOs:

```json
{
  "range": {
    "summaryUpdatedAt": {
      "gte": "now-48h"
    }
  }
}
```

_Reviewer note: I also changed the spelling of a UI component, should be
a completely transparent change._

## Example

### Before

This is my local running on `main`:

<img width="1116" alt="image"
src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225">

### After

This is my local running on this PR branch:

<img width="1120" alt="image"
src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2">

### Proof query works

You can replicate these results by including a similar agg on a query
against SLO data. I added a terms agg to the `stale` agg to determine
how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up
in `stale` should match the difference between the total `doc_count`
from `healthy` and the `doc_count` in the `not_stale` sub-aggregation.

#### Query

You can run this example aggs:

```json
{
  "aggs": {
    "stale": {
      "filter": {
        "range": {
          "summaryUpdatedAt": {
            "lt": "now-48h"
          }
        }
      },
      "aggs": {
        "by_status": {
          "terms": {
            "field": "status"
          }
        }
      }
    },
    "healthy": {
      "filter": {
        "term": {
          "status": "HEALTHY"
        }
      },
      "aggs": {
        "not_stale": {
          "filter": {
            "range": {
              "summaryUpdatedAt": {
                "gte": "now-48h"
              }
            }
          }
        }
      }
    }
  }
}
```

#### Relevant output

Here's a subset of my example query output. You can see that
`stale.by_status.buckets[1]` contains a total of 2 docs, which is the
difference between `healthy.doc_count` and
`healthy.not_stale.doc_count`.

```json
{
  "stale": {
    "doc_count": 7,
    "by_status": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "VIOLATED",
          "doc_count": 5
        },
        {
          "key": "HEALTHY",
          "doc_count": 2
        }
      ]
    }
  },
  "healthy": {
    "doc_count": 9,
    "not_stale": {
      "doc_count": 7
    }
  }
}
```

(cherry picked from commit a92103b)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 26, 2024
## Summary

Resolves elastic#198911.

The result is achieved by nesting a new filter agg inside the existing
`HEALTHY` agg to remove any stale SLOs from the ultimate result.

This required a modification of the parsing code on the ES response to
include a new `not_stale` key. The original `success` total is preserved
in the `doc_count` of that agg, but is no longer referenced.

The filter for the `not_stale` agg I have added is the logical inverse
of the filter we're using to determine stale SLOs:

```json
{
  "range": {
    "summaryUpdatedAt": {
      "gte": "now-48h"
    }
  }
}
```

_Reviewer note: I also changed the spelling of a UI component, should be
a completely transparent change._

## Example

### Before

This is my local running on `main`:

<img width="1116" alt="image"
src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225">

### After

This is my local running on this PR branch:

<img width="1120" alt="image"
src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2">

### Proof query works

You can replicate these results by including a similar agg on a query
against SLO data. I added a terms agg to the `stale` agg to determine
how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up
in `stale` should match the difference between the total `doc_count`
from `healthy` and the `doc_count` in the `not_stale` sub-aggregation.

#### Query

You can run this example aggs:

```json
{
  "aggs": {
    "stale": {
      "filter": {
        "range": {
          "summaryUpdatedAt": {
            "lt": "now-48h"
          }
        }
      },
      "aggs": {
        "by_status": {
          "terms": {
            "field": "status"
          }
        }
      }
    },
    "healthy": {
      "filter": {
        "term": {
          "status": "HEALTHY"
        }
      },
      "aggs": {
        "not_stale": {
          "filter": {
            "range": {
              "summaryUpdatedAt": {
                "gte": "now-48h"
              }
            }
          }
        }
      }
    }
  }
}
```

#### Relevant output

Here's a subset of my example query output. You can see that
`stale.by_status.buckets[1]` contains a total of 2 docs, which is the
difference between `healthy.doc_count` and
`healthy.not_stale.doc_count`.

```json
{
  "stale": {
    "doc_count": 7,
    "by_status": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "VIOLATED",
          "doc_count": 5
        },
        {
          "key": "HEALTHY",
          "doc_count": 2
        }
      ]
    }
  },
  "healthy": {
    "doc_count": 9,
    "not_stale": {
      "doc_count": 7
    }
  }
}
```

(cherry picked from commit a92103b)
kibanamachine added a commit that referenced this issue Nov 26, 2024
…) (#201830)

# Backport

This will backport the following commits from `main` to `8.17`:
- [[SLO] Exclude stale slos from healthy count on overview
(#201027)](#201027)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Justin
Kambic","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T16:23:20Z","message":"[SLO]
Exclude stale slos from healthy count on overview (#201027)\n\n##
Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.17.0"],"title":"[SLO]
Exclude stale slos from healthy count on
overview","number":201027,"url":"https://github.com/elastic/kibana/pull/201027","mergeCommit":{"message":"[SLO]
Exclude stale slos from healthy count on overview (#201027)\n\n##
Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},"sourceBranch":"main","suggestedTargetBranches":["8.17"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201027","number":201027,"mergeCommit":{"message":"[SLO]
Exclude stale slos from healthy count on overview (#201027)\n\n##
Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Justin Kambic <[email protected]>
kibanamachine added a commit that referenced this issue Nov 26, 2024
… (#201831)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[SLO] Exclude stale slos from healthy count on overview
(#201027)](#201027)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Justin
Kambic","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-26T16:23:20Z","message":"[SLO]
Exclude stale slos from healthy count on overview (#201027)\n\n##
Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.17.0"],"title":"[SLO]
Exclude stale slos from healthy count on
overview","number":201027,"url":"https://github.com/elastic/kibana/pull/201027","mergeCommit":{"message":"[SLO]
Exclude stale slos from healthy count on overview (#201027)\n\n##
Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},"sourceBranch":"main","suggestedTargetBranches":["8.17"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201027","number":201027,"mergeCommit":{"message":"[SLO]
Exclude stale slos from healthy count on overview (#201027)\n\n##
Summary\r\n\r\nResolves #198911.\r\n\r\nThe result is achieved by
nesting a new filter agg inside the existing\r\n`HEALTHY` agg to remove
any stale SLOs from the ultimate result.\r\n\r\nThis required a
modification of the parsing code on the ES response to\r\ninclude a new
`not_stale` key. The original `success` total is preserved\r\nin the
`doc_count` of that agg, but is no longer referenced.\r\n\r\nThe filter
for the `not_stale` agg I have added is the logical inverse\r\nof the
filter we're using to determine stale SLOs:\r\n\r\n```json\r\n{\r\n
\"range\": {\r\n \"summaryUpdatedAt\": {\r\n \"gte\": \"now-48h\"\r\n
}\r\n }\r\n}\r\n```\r\n\r\n_Reviewer note: I also changed the spelling
of a UI component, should be\r\na completely transparent
change._\r\n\r\n## Example\r\n\r\n### Before\r\n\r\nThis is my local
running on `main`:\r\n\r\n<img width=\"1116\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225\">\r\n\r\n\r\n###
After\r\n\r\nThis is my local running on this PR branch:\r\n\r\n<img
width=\"1120\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2\">\r\n\r\n\r\n###
Proof query works\r\n\r\nYou can replicate these results by including a
similar agg on a query\r\nagainst SLO data. I added a terms agg to the
`stale` agg to determine\r\nhow many SLOs I need to remove. The number
of `HEALTHY` SLOs showing up\r\nin `stale` should match the difference
between the total `doc_count`\r\nfrom `healthy` and the `doc_count` in
the `not_stale` sub-aggregation.\r\n\r\n#### Query\r\n\r\nYou can run
this example aggs:\r\n\r\n```json\r\n{\r\n \"aggs\": {\r\n \"stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"lt\": \"now-48h\"\r\n }\r\n }\r\n },\r\n \"aggs\": {\r\n
\"by_status\": {\r\n \"terms\": {\r\n \"field\": \"status\"\r\n }\r\n
}\r\n }\r\n },\r\n \"healthy\": {\r\n \"filter\": {\r\n \"term\": {\r\n
\"status\": \"HEALTHY\"\r\n }\r\n },\r\n \"aggs\": {\r\n \"not_stale\":
{\r\n \"filter\": {\r\n \"range\": {\r\n \"summaryUpdatedAt\": {\r\n
\"gte\": \"now-48h\"\r\n }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n
}\r\n}\r\n```\r\n\r\n#### Relevant output\r\n\r\nHere's a subset of my
example query output. You can see that\r\n`stale.by_status.buckets[1]`
contains a total of 2 docs, which is the\r\ndifference between
`healthy.doc_count`
and\r\n`healthy.not_stale.doc_count`.\r\n\r\n```json\r\n{\r\n \"stale\":
{\r\n \"doc_count\": 7,\r\n \"by_status\": {\r\n
\"doc_count_error_upper_bound\": 0,\r\n \"sum_other_doc_count\": 0,\r\n
\"buckets\": [\r\n {\r\n \"key\": \"VIOLATED\",\r\n \"doc_count\": 5\r\n
},\r\n {\r\n \"key\": \"HEALTHY\",\r\n \"doc_count\": 2\r\n }\r\n ]\r\n
}\r\n },\r\n \"healthy\": {\r\n \"doc_count\": 9,\r\n \"not_stale\":
{\r\n \"doc_count\": 7\r\n }\r\n
}\r\n}\r\n```","sha":"a92103b2a9c06e3af30dea591ac769b995c78145"}},{"branch":"8.17","label":"v8.17.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Justin Kambic <[email protected]>
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this issue Nov 26, 2024
## Summary

Resolves elastic#198911.

The result is achieved by nesting a new filter agg inside the existing
`HEALTHY` agg to remove any stale SLOs from the ultimate result.

This required a modification of the parsing code on the ES response to
include a new `not_stale` key. The original `success` total is preserved
in the `doc_count` of that agg, but is no longer referenced.

The filter for the `not_stale` agg I have added is the logical inverse
of the filter we're using to determine stale SLOs:

```json
{
  "range": {
    "summaryUpdatedAt": {
      "gte": "now-48h"
    }
  }
}
```

_Reviewer note: I also changed the spelling of a UI component, should be
a completely transparent change._

## Example

### Before

This is my local running on `main`:

<img width="1116" alt="image"
src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225">


### After

This is my local running on this PR branch:

<img width="1120" alt="image"
src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2">


### Proof query works

You can replicate these results by including a similar agg on a query
against SLO data. I added a terms agg to the `stale` agg to determine
how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up
in `stale` should match the difference between the total `doc_count`
from `healthy` and the `doc_count` in the `not_stale` sub-aggregation.

#### Query

You can run this example aggs:

```json
{
  "aggs": {
    "stale": {
      "filter": {
        "range": {
          "summaryUpdatedAt": {
            "lt": "now-48h"
          }
        }
      },
      "aggs": {
        "by_status": {
          "terms": {
            "field": "status"
          }
        }
      }
    },
    "healthy": {
      "filter": {
        "term": {
          "status": "HEALTHY"
        }
      },
      "aggs": {
        "not_stale": {
          "filter": {
            "range": {
              "summaryUpdatedAt": {
                "gte": "now-48h"
              }
            }
          }
        }
      }
    }
  }
}
```

#### Relevant output

Here's a subset of my example query output. You can see that
`stale.by_status.buckets[1]` contains a total of 2 docs, which is the
difference between `healthy.doc_count` and
`healthy.not_stale.doc_count`.

```json
{
  "stale": {
    "doc_count": 7,
    "by_status": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "VIOLATED",
          "doc_count": 5
        },
        {
          "key": "HEALTHY",
          "doc_count": 2
        }
      ]
    }
  },
  "healthy": {
    "doc_count": 9,
    "not_stale": {
      "doc_count": 7
    }
  }
}
```
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 12, 2024
## Summary

Resolves elastic#198911.

The result is achieved by nesting a new filter agg inside the existing
`HEALTHY` agg to remove any stale SLOs from the ultimate result.

This required a modification of the parsing code on the ES response to
include a new `not_stale` key. The original `success` total is preserved
in the `doc_count` of that agg, but is no longer referenced.

The filter for the `not_stale` agg I have added is the logical inverse
of the filter we're using to determine stale SLOs:

```json
{
  "range": {
    "summaryUpdatedAt": {
      "gte": "now-48h"
    }
  }
}
```

_Reviewer note: I also changed the spelling of a UI component, should be
a completely transparent change._

## Example

### Before

This is my local running on `main`:

<img width="1116" alt="image"
src="https://github.com/user-attachments/assets/80f86426-c7f1-4847-830f-a311c865a225">


### After

This is my local running on this PR branch:

<img width="1120" alt="image"
src="https://github.com/user-attachments/assets/2c4c4f26-2407-41ca-bf01-9ca730bbfab2">


### Proof query works

You can replicate these results by including a similar agg on a query
against SLO data. I added a terms agg to the `stale` agg to determine
how many SLOs I need to remove. The number of `HEALTHY` SLOs showing up
in `stale` should match the difference between the total `doc_count`
from `healthy` and the `doc_count` in the `not_stale` sub-aggregation.

#### Query

You can run this example aggs:

```json
{
  "aggs": {
    "stale": {
      "filter": {
        "range": {
          "summaryUpdatedAt": {
            "lt": "now-48h"
          }
        }
      },
      "aggs": {
        "by_status": {
          "terms": {
            "field": "status"
          }
        }
      }
    },
    "healthy": {
      "filter": {
        "term": {
          "status": "HEALTHY"
        }
      },
      "aggs": {
        "not_stale": {
          "filter": {
            "range": {
              "summaryUpdatedAt": {
                "gte": "now-48h"
              }
            }
          }
        }
      }
    }
  }
}
```

#### Relevant output

Here's a subset of my example query output. You can see that
`stale.by_status.buckets[1]` contains a total of 2 docs, which is the
difference between `healthy.doc_count` and
`healthy.not_stale.doc_count`.

```json
{
  "stale": {
    "doc_count": 7,
    "by_status": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "VIOLATED",
          "doc_count": 5
        },
        {
          "key": "HEALTHY",
          "doc_count": 2
        }
      ]
    }
  },
  "healthy": {
    "doc_count": 9,
    "not_stale": {
      "doc_count": 7
    }
  }
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants