Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AAD Adoption - Onboard remaining O11y rule types to use alerts-as-data #171793

Closed
6 tasks
mikecote opened this issue Nov 22, 2023 · 5 comments · Fixed by #174174
Closed
6 tasks

AAD Adoption - Onboard remaining O11y rule types to use alerts-as-data #171793

mikecote opened this issue Nov 22, 2023 · 5 comments · Fixed by #174174
Assignees
Labels
Feature:Alerting Team:obs-ux-management Observability Management User Experience Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability

Comments

@mikecote
Copy link
Contributor

We should onboard the remaining O11y rule types that don't report alerts-as-data documents today to use the FAAD APIs and persist alerts-as-data documents with some additional data.

Remaining rule types (that don't use rule registry nor the new framework alerts-as-data APIs):

  • Uptime TLS (Legacy)
  • Infrastructure anomaly

We should take the following into consideration:

  • Collaborating with the relevant teams regarding this change
  • Determine which alerts table will have these alerts display within (if any)
  • Potentially re-using the same alerts index that is already in place for similar rules
  • Copying additional information from the context variables
@mikecote mikecote added Feature:Alerting Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:obs-ux-management Observability Management User Experience Team labels Nov 22, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@mikecote
Copy link
Contributor Author

Note for Uptime and O11y UX Management teams, this is a tentative plan for Response Ops in 8.13. We want to move towards a single architecture for all alerting rule types, which include legacy ones. We can accommodate if ever you don't want these alerts to show up in the O11y alerts tables, etc. we're still in the very early stages on this effort.

@heespi heespi changed the title Onboard remaining O11y rule types to use alerts-as-data AAD Adoption - Onboard remaining O11y rule types to use alerts-as-data Nov 27, 2023
@paulb-elastic
Copy link
Contributor

The Uptime TLS (Legacy) cannot be created any more, so we don't believe any changes are needed for this rule type

@mikecote mikecote moved this from Awaiting Triage to Todo in AppEx: ResponseOps - Execution & Connectors Nov 30, 2023
ymao1 added a commit that referenced this issue Jan 16, 2024
…type to write default alerts-as-data docs (#174174)

Towards elastic/response-ops-team#164
Resolves #171793

## Summary

* Switches legacy uptime rule type to use `alertsClient` from alerting
framework in favor of the deprecated `alertFactory`
* Defines the `default` alert config for these rule types so framework
level fields will be written out into the
`.alerts-default.alerts-default` index with no rule type specific
fields.

Example active alert doc
```
{
    "_index": ".internal.alerts-default.alerts-default-000001",
    "_id": "d79f1aa3-2a05-4dd6-9fce-4efa1f5ed596",
    "_score": 1,
    "_source": {
        "kibana.alert.rule.category": "Uptime TLS (Legacy)",
        "kibana.alert.rule.consumer": "alerts",
        "kibana.alert.rule.execution.uuid": "284b0ebd-cb4f-412a-b24f-bc4a211dd904",
        "kibana.alert.rule.name": "test_tls",
        "kibana.alert.rule.parameters": {},
        "kibana.alert.rule.producer": "uptime",
        "kibana.alert.rule.revision": 0,
        "kibana.alert.rule.rule_type_id": "xpack.uptime.alerts.tls",
        "kibana.alert.rule.tags": [],
        "kibana.alert.rule.uuid": "62b6756b-1a71-4a4e-a325-95ddd3ab66fb",
        "kibana.space_ids": [
            "default"
        ],
        "@timestamp": "2024-01-09T18:55:06.565Z",
        "event.action": "open",
        "event.kind": "signal",
        "kibana.alert.action_group": "xpack.uptime.alerts.actionGroups.tls",
        "kibana.alert.flapping": false,
        "kibana.alert.flapping_history": [
            true
        ],
        "kibana.alert.instance.id": "xpack.uptime.alerts.actionGroups.tls",
        "kibana.alert.maintenance_window_ids": [],
        "kibana.alert.status": "active",
        "kibana.alert.uuid": "d79f1aa3-2a05-4dd6-9fce-4efa1f5ed596",
        "kibana.alert.workflow_status": "open",
        "kibana.alert.duration.us": 0,
        "kibana.alert.start": "2024-01-09T18:55:06.565Z",
        "kibana.alert.time_range": {
            "gte": "2024-01-09T18:55:06.565Z"
        },
        "kibana.version": "8.13.0",
        "tags": []
    }
}
```

Example recovered alert doc
```
{
    "_index": ".internal.alerts-default.alerts-default-000001",
    "_id": "d79f1aa3-2a05-4dd6-9fce-4efa1f5ed596",
    "_score": 1,
    "_source": {
        "kibana.alert.rule.category": "Uptime TLS (Legacy)",
        "kibana.alert.rule.consumer": "alerts",
        "kibana.alert.rule.execution.uuid": "6d7e602b-65cb-43a5-a420-49b36bf28b45",
        "kibana.alert.rule.name": "test_tls",
        "kibana.alert.rule.producer": "uptime",
        "kibana.alert.rule.rule_type_id": "xpack.uptime.alerts.tls",
        "kibana.alert.rule.tags": [],
        "kibana.alert.rule.uuid": "62b6756b-1a71-4a4e-a325-95ddd3ab66fb",
        "kibana.space_ids": [
            "default"
        ],
        "@timestamp": "2024-01-09T19:01:41.756Z",
        "event.action": "close",
        "event.kind": "signal",
        "kibana.alert.action_group": "recovered",
        "kibana.alert.flapping_history": [
            true,
            true
        ],
        "kibana.alert.instance.id": "xpack.uptime.alerts.actionGroups.tls",
        "kibana.alert.maintenance_window_ids": [],
        "kibana.alert.status": "recovered",
        "kibana.alert.uuid": "d79f1aa3-2a05-4dd6-9fce-4efa1f5ed596",
        "kibana.alert.workflow_status": "open",
        "kibana.alert.start": "2024-01-09T18:55:06.565Z",
        "kibana.alert.time_range": {
            "gte": "2024-01-09T18:55:06.565Z",
            "lte": "2024-01-09T19:01:41.756Z"
        },
        "kibana.version": "8.13.0",
        "tags": [],
        "kibana.alert.rule.parameters": {},
        "kibana.alert.rule.revision": 0,
        "kibana.alert.flapping": false,
        "kibana.alert.duration.us": 395191000,
        "kibana.alert.end": "2024-01-09T19:01:41.756Z"
    }
}
```

## To Verify
This rule is not create-able from the UI, but still create-able from the
API. I looked back at 7.8 when the rule type was introduced and it looks
like the same parameters, so I was able to create it using the API in
Dev Tools

```
POST kbn:/api/alerting/rule
{
    "name": "test_tls",
    "consumer": "alerts",
    "rule_type_id": "xpack.uptime.alerts.tls",
    "schedule": {
        "interval": "1d"
    },
    "actions": [],
    "tags": [],
    "enabled": true,
    "params": {}
}
```

Then I changed the executor to force an active alert

```
--- a/x-pack/plugins/uptime/server/legacy_uptime/lib/alerts/tls_legacy.ts
+++ b/x-pack/plugins/uptime/server/legacy_uptime/lib/alerts/tls_legacy.ts
@@ -151,6 +151,10 @@ export const tlsLegacyRuleFactory: LegacyUptimeRuleTypeFactory<ActionGroupIds> =
     });

     const foundCerts = total > 0;
+    alertsClient.report({
+      id: TLS_LEGACY.id,
+      actionGroup: TLS_LEGACY.id,
+    });

     if (foundCerts) {
       const absoluteExpirationThreshold = moment()
```
and ran the rule. I was able to see the alert doc in the
`.alerts-default.alerts-default` and the active alert in the UI:
<img width="2269" alt="Screenshot 2024-01-09 at 2 01 36 PM"
src="https://github.com/elastic/kibana/assets/13104637/2bc90c0f-9151-44d7-8b71-03a652641698">

Then I reverted the above change and ran the rule again to see the
recovered alert doc in the index and in the UI
<img width="2269" alt="Screenshot 2024-01-09 at 2 02 00 PM"
src="https://github.com/elastic/kibana/assets/13104637/24b4d6e4-a2ef-42fc-b4b2-42b099b12359">

Co-authored-by: Kibana Machine <[email protected]>
@ymao1 ymao1 self-assigned this Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Team:obs-ux-management Observability Management User Experience Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability
Projects
No open projects
4 participants