Skip to content

Commit

Permalink
more readme
Browse files Browse the repository at this point in the history
  • Loading branch information
83bytes committed Jul 2, 2024
1 parent aebd6bd commit df97c36
Show file tree
Hide file tree
Showing 2 changed files with 86 additions and 17 deletions.
6 changes: 6 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"cSpell.ignoreWords": [
"alertmanager",
"alertpipeline"
]
}
97 changes: 80 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,40 @@ The API is simple and extensible enough so as to enable users to extend the fram

We will refer to this as the `tam`

## Design
## Quick Start

Over here, we will quickly set up the tam on a docker-compose and fire a test-event using curl to see how it works. <br>
Once we have a basic example running, we can then dive into the details.

_Note:_ This assumes that you have docker and docker-compose installed. Furthermore, you need a slack webhook url that is configured to send data to a channel.

1. Get a Slack Webhook and copy the secret (this is the part after the `https://hooks.slack.com/services/` in the webhook-url)
2. Put this secret in a (`alertmanager/.env` directory) as follows

```
WEBHOOK_SECRET=secret we copied in step 1
```

3. Run `make docker-build`
4. Run `make sed`
5. Run `docker compose up -d`
6. Send the basicWebhookPayload.json to the tam using curl

```
curl -v -H "Content-Type: application/json" -X POST localhost:8081/webhook -d @basicWebhookPayload.json
```

**NOTE** If everthing is configured correctly, then you should see a message in the channel that you have configured. If not, please look at the logs. The tam in docker-compose has debug logs enabled which are quite verbose.

Sample output

```
alert: NOOP_ALERT
action: SendToSlack
result of ENRICHMENT_STEP_1 enrichment(s): ARG1,ARG2
```

# Design

The tam is a simple webhook server.

Expand All @@ -27,7 +60,7 @@ Each alertpipeline is defined by
- A list of Enrichments
- A list of Actions.

For example, a typlcal config would look like this
For example, a typical config would look like this

```
alert_pipelines:
Expand All @@ -42,32 +75,58 @@ alert_pipelines:
action_args: "url"
```

We can use the `alertmanager` to generate a sample config. We can redirect this output to a file and then modify it to our needs.

```
$ ./alertmanager config generate-template
```

```
alert_pipelines:
- alert_name: NOOP_ALERT
enrichments:
- step_name: ENRICHMENT_STEP_1
enrichment_name: NOOP_ENRICHMENT
enrichment_args: ARG1,ARG2
actions:
- step_name: ACTION_STEP_1
action_name: NOOP_ACTION
action_args: ARG1,ARG2
```

We can use the in-built config-validator to check if the config-file is up-to-spec or not

```
$ ./alertmanager config validate --config-file /path/to/file
```

The list of available [enrichments](enrichment/README.md) and [actions](action/README.md) are available in the respective docs.

## How does the TAM work ?

The tam accepts a JSON payload in the following format

```
{
"version": "4",
"groupKey": <string>, // key identifying the group of alerts (e.g. to deduplicate)
"truncatedAlerts": <int>, // how many alerts have been truncated due to "max_alerts"
"groupKey": <string>, // key identifying the group of alerts (e.g. to deduplicate)
"truncatedAlerts": <int>, // how many alerts have been truncated due to "max_alerts"
"status": "<resolved|firing>",
"receiver": <string>,
"groupLabels": <object>,
"commonLabels": <object>,
"commonAnnotations": <object>,
"externalURL": <string>, // backlink to the Alertmanager.
"externalURL": <string>, // backlink to the Alertmanager.
"alerts": [
{
"status": "<resolved|firing>",
"labels": <object>,
"annotations": <object>,
"startsAt": "<rfc3339>",
"endsAt": "<rfc3339>",
"generatorURL": <string>, // identifies the entity that caused the alert
"fingerprint": <string> // fingerprint to identify the alert
},
...
{
"status": "<resolved|firing>",
"labels": <object>,
"annotations": <object>,
"startsAt": "<rfc3339>",
"endsAt": "<rfc3339>",
"generatorURL": <string>, // identifies the entity that caused the alert
"fingerprint": <string> // fingerprint to identify the alert
}
]
}
```
Expand All @@ -77,6 +136,7 @@ note: This is detailed in the prometheus [webhook receiver docs](https://prometh
The alerts object is a list that can contain multiple `alert`. Each of them are of the following format

```
{
"annotations": {
"description": "Pod customer is restarting 2.11 times / 10 minutes.",
Expand All @@ -101,6 +161,7 @@ The alerts object is a list that can contain multiple `alert`. Each of them are
"startsAt": "2022-03-02T07:31:57.339Z",
"status": "firing"
}
```

The tam uses the `labels.alertname` as a primary identifier to identify alerts and identify configured pipelines for said alerts. Thus, the above configured pipeline for `KubePodCrashLooping` would match this alert and then execute the enrichments and then the Actions.
Expand All @@ -111,13 +172,15 @@ While the Enrichments and Actions can be built by the user using a certain frame

[Actions](./action/README.md) and [Enrichments](./enrichment/README.md) live in their own directories. There are some sample alerts and enrichments pre-built for ease of use.

## SETUP
## SETUP on k8s (kind)

```
kind setup cluster
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prom-stack prometheus-community/kube-prometheus-stack -f values.yml
make sed
helm install prom-stack prometheus-community/kube-prometheus-stack -f deployment/kube-prometheus-stack.yml
kubectl apply -f deployment/toy_alert_manager.yml
```

Expand Down

0 comments on commit df97c36

Please sign in to comment.