Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLO] Verify Alerts ES query for alerts embeddable returns correct results #172448

Closed
mgiota opened this issue Dec 4, 2023 · 4 comments
Closed
Assignees
Labels
Feature:SLO Team:obs-ux-management Observability Management User Experience Team

Comments

@mgiota
Copy link
Contributor

mgiota commented Dec 4, 2023

🍒 Summary

While implementing SLO Alerts Embeddable , initially I used the following ES query taken from use_fetch_active_alerts.ts used in the SLO detail page to retrieve the SLO alerts.

bool: {
      filter: [
        {
          range: {
            '@timestamp': {
              gte: timeRange.from,
            },
          },
        },
        {
          term: {
            'kibana.alert.rule.rule_type_id': 'slo.rules.burnRate',
          },
        },
      ],
      should: slos.map((slo) => ({
        bool: {
          filter: [{ term: { 'slo.id': slo.id } }, { term: { 'slo.instanceId': slo.instanceId } }],
        },
      })),
      minimum_should_match: 1,
    },

Above query didn't work well in the alerts embeddable. In the following Video you can see that I have selected only one SLO and I should see 1 alert, but above query includes more alerts than it should.

Screen.Recording.2023-12-04.at.10.50.45.mov

However above query is still used to calculate the alert summary (number of total alerts) and works well in this case. It returns correct number of alerts.

Current ES query (working solution)

Here's the current ES query used by the alerts table embeddable, which works well and returns correct number of alerts for the alerts table.

 bool: {
          filter: [
            {
              range: {
                '@timestamp': {
                  gte: timeRange.from,
                },
              },
            },
            {
              term: {
                'kibana.alert.rule.rule_type_id': 'slo.rules.burnRate',
              },
            },
            {
              bool: {
                should: sloIdsAndInstanceIds.map(([sloId, instanceId]) => {
                  const filterQuery = [{ term: { 'slo.id': sloId } } as FilterQuery];
                  if (instanceId !== ALL_VALUE) {
                    filterQuery.push({ term: { 'slo.instanceId': instanceId } });
                  }
                  return {
                    bool: {
                      filter: filterQuery,
                    },
                  };
                }),
              },
            },
          ],
        },

The difference with above query is that

  1. I wrapped my should clause inside filter based on this solution
  2. I exclude term: { 'slo.instanceId': '*' } from the filter query cause there was a scenario tested by @XavierM where it returned 0 alerts and excluding this from the query fixed it. In the screenshot below you can see the SLO that returned 0 alerts in the SLO alerts embeddable, whereas Observability alerts page displayed plenty of alerts.
Screenshot 2023-12-01 at 22 23 00

Acceptance criteria

  • Verify that current working solution actually works well and returns correct alerts for the alerts table. Try creating many SLOs, group by & non group by, and verify that alerts in the Dashboard app match the alerts in the Observability alerts page & Dev Tools
  • Use the same ES query for both alerts summary and alerts table as per @XavierM's suggestion
  • Write automated tests
@mgiota mgiota added the Team:obs-ux-management Observability Management User Experience Team label Dec 4, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@mgiota mgiota self-assigned this Dec 4, 2023
@kdelemme
Copy link
Contributor

kdelemme commented Dec 4, 2023

The new query looks correct to me, but I cannot really understand why the extra bool makes it different, especially in your case with only one should condition 🤔

@mgiota
Copy link
Contributor Author

mgiota commented Dec 5, 2023

I have one should condition, but in alerts embeddable case, I can select multiple SLOs, so I end up having multiple queries in the should clause. In the SLO pages you fetch active alerts per one sloId & slo instanceId.

{  
   "query":{  
      "bool":{  
         "filter":[{  
            "bool":{  
               "should":[  
                  {  // Query 1 },
                  {  // Query 2 }
               ]
            }
         }]
      }
   }
}

I needed an or clause, and based on this stackoverflow suggestion I had to wrap my should query in a filter query.

@simianhacker
Copy link
Member

Fixed with #169910

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:SLO Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

No branches or pull requests

4 participants