An alertmanager that supports performing arbitrary enrichments on alerts and take appripriate action.
These enrichments and actions have to be preconfigured. The API is simple and extensible enough so as to enable users to extend the framework.
We will refer to this as the tam
Over here, we will quickly set up the tam on a docker-compose and fire a test-event using curl to see how it works.
Once we have a basic example running, we can then dive into the details.
Note: This assumes that you have docker and docker-compose installed. Furthermore, you need a slack webhook url that is configured to send data to a channel.
- Get a Slack Webhook and copy the secret (this is the part after the
https://hooks.slack.com/services/
in the webhook-url) - Put this secret in a (
alertmanager/.env
directory) as follows
WEBHOOK_SECRET=secret we copied in step 1
- Run
make docker-build
- Run
make sed
- Run
docker compose up -d
- Send the basicWebhookPayload.json to the tam using curl
curl -v -H "Content-Type: application/json" -X POST localhost:8081/webhook -d @basicWebhookPayload.json
NOTE If everthing is configured correctly, then you should see a message in the channel that you have configured. If not, please look at the logs. The tam in docker-compose has debug logs enabled which are quite verbose.
Sample output
alert: NOOP_ALERT
action: SendToSlack
result of ENRICHMENT_STEP_1 enrichment(s): ARG1,ARG2
TAM is a cli application that works in 2 operating mode. The first one is the configuration mode, where the binary can either generate a sample config or validate a config-file. This ensures that users have a way of validating a config before deploying it in an environment.
The config mode
is accessed by using the config
subcommand in the alertmanager cli.
$ ./alertmanager config --help
Use this command to validate an existing config-file or to generate a sample template
Usage:
alertmanager config [command]
Available Commands:
generate-template generate a sample config template
validate validate a config-file for errors
Flags:
-h, --help help for config
Use "alertmanager config [command] --help" for more information about a command.
The server mode
is when the tam is operating as a server and accepts webhook at url:port/webhook
api-endpoint
$ ./alertmanager server --help
Start the AlertManager Webhook Server
Usage:
alertmanager server [flags]
Flags:
--config-file string Path to alert config (default "./alert-manager-config.yml")
-h, --help help for server
--log-level string log-level for alertmanager; options INFO|DEBUG|ERROR (default "INFO")
--server-port int Port to listen on (default 8081)
Basic health-check endpoint. Get /ping
responds with a pong
Accepts json as a POST request.
Sending a request using curl
curl -v -H "Content-Type: application/json" -X POST localhost:8081/webhook -d @basicWebhookPayload.json
The tam is a simple webhook server.
We can configure the tam to enrich alerts by pulling data from external systems AND take actions.
The enrichments and the actions that are possible / relevant for each alert is highly context dependent and is upto the user to build and configure.
The collection of enrichments
and actions
for a given alert
is called an alertPipeline
. We will see how to configure such a pipeline below.
The tam is configured by using a config file (yaml format) which defines multiple alertpipelines.
Each alertpipeline is defined by
- an AlertName
- A list of Enrichments
- A list of Actions.
For example, a typical config would look like this
alert_pipelines:
- alert_name: KubePodCrashLooping
enrichments:
- step_name: enrichment_step_1
enrichment_name: GET_DATA
enrichment_args: "promql"
actions:
- step_name: action_step_1
action_name: NotifySLack
action_args: "url"
We can use the alertmanager
to generate a sample config. We can redirect this output to a file and then modify it to our needs.
$ ./alertmanager config generate-template
alert_pipelines:
- alert_name: NOOP_ALERT
enrichments:
- step_name: ENRICHMENT_STEP_1
enrichment_name: NOOP_ENRICHMENT
enrichment_args: ARG1,ARG2
actions:
- step_name: ACTION_STEP_1
action_name: NOOP_ACTION
action_args: ARG1,ARG2
We can use the in-built config-validator to check if the config-file is up-to-spec or not
$ ./alertmanager config validate --config-file /path/to/file
The list of available enrichments and actions are available in the respective docs.
The tam accepts a JSON payload in the following format
{
"version": "4",
"groupKey": <string>, // key identifying the group of alerts (e.g. to deduplicate)
"truncatedAlerts": <int>, // how many alerts have been truncated due to "max_alerts"
"status": "<resolved|firing>",
"receiver": <string>,
"groupLabels": <object>,
"commonLabels": <object>,
"commonAnnotations": <object>,
"externalURL": <string>, // backlink to the Alertmanager.
"alerts": [
{
"status": "<resolved|firing>",
"labels": <object>,
"annotations": <object>,
"startsAt": "<rfc3339>",
"endsAt": "<rfc3339>",
"generatorURL": <string>, // identifies the entity that caused the alert
"fingerprint": <string> // fingerprint to identify the alert
}
]
}
note: This is detailed in the prometheus webhook receiver docs
The alerts object is a list that can contain multiple alert
. Each of them are of the following format
{
"annotations": {
"description": "Pod customer is restarting 2.11 times / 10 minutes.",
"runbook_url": "",
"summary": "Pod is crash looping."
},
"labels": {
"alertname": "KubePodCrashLooping",
"cluster": "cluster-main",
"container": "rs-transformer",
"endpoint": "http",
"job": "kube-state-metrics",
"namespace": "customer",
"pod": "customer",
"priority": "P0",
"prometheus": "monitoring/kube-prometheus-stack-prometheus",
"region": "us-west-1",
"replica": "0",
"service": "kube-prometheus-stack-kube-state-metrics",
"severity": "CRITICAL"
},
"startsAt": "2022-03-02T07:31:57.339Z",
"status": "firing"
}
The tam uses the labels.alertname
as a primary identifier to identify alerts and identify configured pipelines for said alerts. Thus, the above configured pipeline for KubePodCrashLooping
would match this alert and then execute the enrichments and then the Actions.
While the Enrichments and Actions can be built by the user using a certain framework, it should be noted that the enrichment runtime has a full copy of the alert body it was configured for. Similarly the alert runtime as a full copy of the alert as well the enrichments and their corresponding output. We shall see how build our own enrichments and actions in a bit.
Actions and Enrichments live in their own directories. There are some sample alerts and enrichments pre-built for ease of use.
kind setup cluster
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
make sed
helm install prom-stack prometheus-community/kube-prometheus-stack -f deployment/kube-prometheus-stack.yml
kubectl apply -f deployment/toy_alert_manager.yml
It is designed to run in a secure enironment, hence there is no support for authentication and authorization.
DO NOT EXPOSE THIS TO THE OPEN INTERNET