Toy Alert Manager

An alertmanager that supports performing arbitrary enrichments on alerts and take appripriate action.

These enrichments and actions have to be preconfigured. The API is simple and extensible enough so as to enable users to extend the framework.

We will refer to this as the tam

Quick Start

Over here, we will quickly set up the tam on a docker-compose and fire a test-event using curl to see how it works.
Once we have a basic example running, we can then dive into the details.

Note: This assumes that you have docker and docker-compose installed. Furthermore, you need a slack webhook url that is configured to send data to a channel.

Get a Slack Webhook and copy the secret (this is the part after the https://hooks.slack.com/services/ in the webhook-url)
Put this secret in a (alertmanager/.env directory) as follows

WEBHOOK_SECRET=secret we copied in step 1

Run make docker-build
Run make sed
Run docker compose up -d
Send the basicWebhookPayload.json to the tam using curl

curl -v -H "Content-Type: application/json" -X POST localhost:8081/webhook -d @basicWebhookPayload.json

NOTE If everthing is configured correctly, then you should see a message in the channel that you have configured. If not, please look at the logs. The tam in docker-compose has debug logs enabled which are quite verbose.

Sample output

alert: NOOP_ALERT
action: SendToSlack
result of ENRICHMENT_STEP_1 enrichment(s):  ARG1,ARG2

Running the CLI

TAM is a cli application that works in 2 operating mode. The first one is the configuration mode, where the binary can either generate a sample config or validate a config-file. This ensures that users have a way of validating a config before deploying it in an environment.

The config mode is accessed by using the config subcommand in the alertmanager cli.

$ ./alertmanager config --help
Use this command to validate an existing config-file or to generate a sample template

Usage:
  alertmanager config [command]

Available Commands:
  generate-template generate a sample config template
  validate          validate a config-file for errors

Flags:
  -h, --help   help for config

Use "alertmanager config [command] --help" for more information about a command.

The server mode is when the tam is operating as a server and accepts webhook at url:port/webhook api-endpoint

$ ./alertmanager server --help
Start the AlertManager Webhook Server

Usage:
  alertmanager server [flags]

Flags:
      --config-file string   Path to alert config (default "./alert-manager-config.yml")
  -h, --help                 help for server
      --log-level string     log-level for alertmanager; options INFO|DEBUG|ERROR (default "INFO")
      --server-port int      Port to listen on (default 8081)

API Endpoint

/ping

Basic health-check endpoint. Get /ping responds with a pong

/webhook/

Accepts json as a POST request.

Sending a request using curl

curl -v -H "Content-Type: application/json" -X POST localhost:8081/webhook -d @basicWebhookPayload.json

Sample Json Payload

Design

The tam is a simple webhook server.

We can configure the tam to enrich alerts by pulling data from external systems AND take actions.

The enrichments and the actions that are possible / relevant for each alert is highly context dependent and is upto the user to build and configure.

The collection of enrichments and actions for a given alert is called an alertPipeline. We will see how to configure such a pipeline below.

Configuring an Alert-Pipeline

The tam is configured by using a config file (yaml format) which defines multiple alertpipelines.

Each alertpipeline is defined by

an AlertName
A list of Enrichments
A list of Actions.

For example, a typical config would look like this

alert_pipelines:
  - alert_name: KubePodCrashLooping
    enrichments:
      - step_name: enrichment_step_1
        enrichment_name: GET_DATA
        enrichment_args: "promql"
    actions:
      - step_name: action_step_1
        action_name: NotifySLack
        action_args: "url"

We can use the alertmanager to generate a sample config. We can redirect this output to a file and then modify it to our needs.

$ ./alertmanager config generate-template

alert_pipelines:
    - alert_name: NOOP_ALERT
      enrichments:
        - step_name: ENRICHMENT_STEP_1
          enrichment_name: NOOP_ENRICHMENT
          enrichment_args: ARG1,ARG2
      actions:
        - step_name: ACTION_STEP_1
          action_name: NOOP_ACTION
          action_args: ARG1,ARG2

We can use the in-built config-validator to check if the config-file is up-to-spec or not

$ ./alertmanager config validate --config-file /path/to/file

The list of available enrichments and actions are available in the respective docs.

How does the TAM work ?

The tam accepts a JSON payload in the following format

{
  "version": "4",
  "groupKey": <string>, // key identifying the group of alerts (e.g. to deduplicate)
  "truncatedAlerts": <int>, // how many alerts have been truncated due to "max_alerts"
  "status": "<resolved|firing>",
  "receiver": <string>,
  "groupLabels": <object>,
  "commonLabels": <object>,
  "commonAnnotations": <object>,
  "externalURL": <string>, // backlink to the Alertmanager.
  "alerts": [
  {
    "status": "<resolved|firing>",
    "labels": <object>,
    "annotations": <object>,
    "startsAt": "<rfc3339>",
    "endsAt": "<rfc3339>",
    "generatorURL": <string>, // identifies the entity that caused the alert
    "fingerprint": <string> // fingerprint to identify the alert
  }
  ]
}

note: This is detailed in the prometheus webhook receiver docs

The alerts object is a list that can contain multiple alert. Each of them are of the following format


{
  "annotations": {
    "description": "Pod customer is restarting 2.11 times / 10 minutes.",
    "runbook_url": "",
    "summary": "Pod is crash looping."
  },
  "labels": {
    "alertname": "KubePodCrashLooping",
    "cluster": "cluster-main",
    "container": "rs-transformer",
    "endpoint": "http",
    "job": "kube-state-metrics",
    "namespace": "customer",
    "pod": "customer",
    "priority": "P0",
    "prometheus": "monitoring/kube-prometheus-stack-prometheus",
    "region": "us-west-1",
    "replica": "0",
    "service": "kube-prometheus-stack-kube-state-metrics",
    "severity": "CRITICAL"
  },
  "startsAt": "2022-03-02T07:31:57.339Z",
  "status": "firing"
}

The tam uses the labels.alertname as a primary identifier to identify alerts and identify configured pipelines for said alerts. Thus, the above configured pipeline for KubePodCrashLooping would match this alert and then execute the enrichments and then the Actions.

While the Enrichments and Actions can be built by the user using a certain framework, it should be noted that the enrichment runtime has a full copy of the alert body it was configured for. Similarly the alert runtime as a full copy of the alert as well the enrichments and their corresponding output. We shall see how build our own enrichments and actions in a bit.

Building Actions and Enrichments

Actions and Enrichments live in their own directories. There are some sample alerts and enrichments pre-built for ease of use.

SETUP on k8s (kind)

kind setup cluster
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
make sed
helm install prom-stack prometheus-community/kube-prometheus-stack -f deployment/kube-prometheus-stack.yml
kubectl apply -f deployment/toy_alert_manager.yml

Caveats

It is designed to run in a secure enironment, hence there is no support for authentication and authorization.

DO NOT EXPOSE THIS TO THE OPEN INTERNET

Name	Name	Last commit message	Last commit date
Latest commit 83bytes cli docs Jul 2, 2024 872bac0 · Jul 2, 2024 History 34 Commits
.vscode	.vscode	more readme	Jul 2, 2024
action	action	docs for action and enrichment	Jul 2, 2024
alert	alert	minor fixes and typo	Jul 1, 2024
cmd	cmd	more tests	Jun 30, 2024
config	config	support multiple same type of enrichments in a pipeline	Jun 30, 2024
deployment	deployment	prepare k8s manifest	Jul 2, 2024
enrichment	enrichment	docs for action and enrichment	Jul 2, 2024
logging	logging	isolate logging logic	Jun 23, 2024
server	server	move alert to types	Jun 29, 2024
types	types	support unique steps by using a unique field for each step	Jun 30, 2024
utils	utils	add tests for json strict unmarshall	Jun 30, 2024
.gitignore	.gitignore	store secret in env	Jul 1, 2024
Dockerfile	Dockerfile	fix k8s manifest	Jul 1, 2024
Makefile	Makefile	docs for action and enrichment	Jul 2, 2024
README.md	README.md	cli docs	Jul 2, 2024
alert-manager-config.yml	alert-manager-config.yml	store secret in env	Jul 1, 2024
basicWebhookPayload.json	basicWebhookPayload.json	got enrichments to work	Jun 27, 2024
docker-compose.yml	docker-compose.yml	minor fixes and typo	Jul 1, 2024
go.mod	go.mod	support unique steps by using a unique field for each step	Jun 30, 2024
go.sum	go.sum	support unique steps by using a unique field for each step	Jun 30, 2024
main.go	main.go	Initial App binary structure	Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toy Alert Manager

Quick Start

Running the CLI

API Endpoint

/ping

/webhook/

Design

Configuring an Alert-Pipeline

How does the TAM work ?

Building Actions and Enrichments

SETUP on k8s (kind)

Caveats

About

Releases

Packages

Languages

83bytes/alertManager

Folders and files

Latest commit

History

Repository files navigation

Toy Alert Manager

Quick Start

Running the CLI

API Endpoint

/ping

/webhook/

Design

Configuring an Alert-Pipeline

How does the TAM work ?

Building Actions and Enrichments

SETUP on k8s (kind)

Caveats

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages