Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Alertmanager config and templates in Helm chart #188

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

TheoBrigitte
Copy link
Member

@TheoBrigitte TheoBrigitte commented Dec 10, 2024

Towards: giantswarm/roadmap#3746

This PR does couple of things to get the Alertmanager into a Secret in the Helm chart:

  • Helm
    • Add secret resource, embedding raw and templated alertmanager files
    • Expose alertmanager templates values as helm chart values
  • Alertmanager
    • Remove all Mimir related conditions from templates
    • Escape template in template
    • Split template into url and notification templates, to reduce template in template escaping
    • Re-use slack actions, to reduce template in template escaping
    • Drop template directive, dynamically set by the operator

How I generated the new Alertmanager config and notification template

Alertmanager config
wget https://raw.githubusercontent.com/giantswarm/prometheus-meta-operator/refs/heads/main/files/templates/alertmanager/alertmanager.yaml
sed -i -e 's/\[\[/{{/g' -e 's/\]\]/}}/g' alertmanager.yaml
patch alertmanager.yaml < <following patch>
13,15d12
< templates:
< - '/etc/alertmanager/config/*.tmpl'
<
24d20
<   {{- if .MimirEnabled }}
34d29
<   {{- end }}
173d167
< {{- if .MimirEnabled }}
187d180
< {{- end }}
206c199
<     actions:
---
>     actions: &slack-actions
209,210c202,203
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
---
>       url: {{`{{ template "__runbookurl" . }}`}}
>       style: {{`{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}`}}
213c206
<       url: '{{ template "__alert_linked_postmortems" . }}'
---
>       url: {{`{{ template "__alert_linked_postmortems" . }}`}}
216c209
<       url: '{{ template "__alerturl" . }}'
---
>       url: {{`{{ template "__alerturl" . }}`}}
219c212
<       url: '{{ template "__dashboardurl" . }}'
---
>       url: {{`{{ template "__dashboardurl" . }}`}}
222,223c215,216
<       url: '{{ template "__alert_silence_link" .}}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>       url: {{`{{ template "__alert_silence_link" .}}`}}
>       style: {{`{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}`}}
242,259c235
<     actions:
<     - type: button
<       text: ':green_book: OpsRecipe'
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
<     - type: button
<       text: ':coffin: Linked PMs'
<       url: '{{ template "__alert_linked_postmortems" . }}'
<     - type: button
<       text: ':mag: Query'
<       url: '{{ template "__alerturl" . }}'
<     - type: button
<       text: ':grafana: Dashboard'
<       url: '{{ template "__dashboardurl" . }}'
<     - type: button
<       text: ':no_bell: Silence'
<       url: '{{ template "__alert_silence_link" .}}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>     actions: *slack-actions
278,295c254
<     actions:
<     - type: button
<       text: ':green_book: OpsRecipe'
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
<     - type: button
<       text: ':coffin: Linked PMs'
<       url: '{{ template "__alert_linked_postmortems" . }}'
<     - type: button
<       text: ':mag: Query'
<       url: '{{ template "__alerturl" . }}'
<     - type: button
<       text: ':grafana: Dashboard'
<       url: '{{ template "__dashboardurl" . }}'
<     - type: button
<       text: ':no_bell: Silence'
<       url: '{{ template "__alert_silence_link" . }}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>     actions: *slack-actions
314,331c273
<     actions:
<     - type: button
<       text: ':green_book: OpsRecipe'
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
<     - type: button
<       text: ':coffin: Linked PMs'
<       url: '{{ template "__alert_linked_postmortems" . }}'
<     - type: button
<       text: ':mag: Query'
<       url: '{{ template "__alerturl" . }}'
<     - type: button
<       text: ':grafana: Dashboard'
<       url: '{{ template "__dashboardurl" . }}'
<     - type: button
<       text: ':no_bell: Silence'
<       url: '{{ template "__alert_silence_link" . }}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>     actions: *slack-actions
350,367c292
<     actions:
<     - type: button
<       text: ':green_book: OpsRecipe'
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
<     - type: button
<       text: ':coffin: Linked PMs'
<       url: '{{ template "__alert_linked_postmortems" . }}'
<     - type: button
<       text: ':mag: Query'
<       url: '{{ template "__alerturl" . }}'
<     - type: button
<       text: ':grafana: Dashboard'
<       url: '{{ template "__dashboardurl" . }}'
<     - type: button
<       text: ':no_bell: Silence'
<       url: '{{ template "__alert_silence_link" . }}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>     actions: *slack-actions
382,399c307
<     actions:
<     - type: button
<       text: ':green_book: OpsRecipe'
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
<     - type: button
<       text: ':coffin: Linked PMs'
<       url: '{{ template "__alert_linked_postmortems" . }}'
<     - type: button
<       text: ':mag: Query'
<       url: '{{ template "__alerturl" . }}'
<     - type: button
<       text: ':grafana: Dashboard'
<       url: '{{ template "__dashboardurl" . }}'
<     - type: button
<       text: ':no_bell: Silence'
<       url: '{{ template "__alert_silence_link" .}}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>     actions: *slack-actions
418,435c326
<     actions:
<     - type: button
<       text: ':green_book: OpsRecipe'
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
<     - type: button
<       text: ':coffin: Linked PMs'
<       url: '{{ template "__alert_linked_postmortems" . }}'
<     - type: button
<       text: ':mag: Query'
<       url: '{{ template "__alerturl" . }}'
<     - type: button
<       text: ':grafana: Dashboard'
<       url: '{{ template "__dashboardurl" . }}'
<     - type: button
<       text: ':no_bell: Silence'
<       url: '{{ template "__alert_silence_link" . }}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>     actions: *slack-actions
450,467c341
<     actions:
<     - type: button
<       text: ':green_book: OpsRecipe'
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
<     - type: button
<       text: ':coffin: Linked PMs'
<       url: '{{ template "__alert_linked_postmortems" . }}'
<     - type: button
<       text: ':mag: Query'
<       url: '{{ template "__alerturl" . }}'
<     - type: button
<       text: ':grafana: Dashboard'
<       url: '{{ template "__dashboardurl" . }}'
<     - type: button
<       text: ':no_bell: Silence'
<       url: '{{ template "__alert_silence_link" .}}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>     actions: *slack-actions
482,499c356
<     actions:
<     - type: button
<       text: ':green_book: OpsRecipe'
<       url: '{{ template "__runbookurl" . }}'
<       style: '{{ if eq .Status "firing" }}primary{{ else }}default{{ end }}'
<     - type: button
<       text: ':coffin: Linked PMs'
<       url: '{{ template "__alert_linked_postmortems" . }}'
<     - type: button
<       text: ':mag: Query'
<       url: '{{ template "__alerturl" . }}'
<     - type: button
<       text: ':grafana: Dashboard'
<       url: '{{ template "__dashboardurl" . }}'
<     - type: button
<       text: ':no_bell: Silence'
<       url: '{{ template "__alert_silence_link" .}}'
<       style: '{{ if eq .Status "firing" }}danger{{ else }}default{{ end }}'
---
>     actions: *slack-actions
504c361
<     tags: "{{ (index .Alerts 0).Labels.alertname }},{{ (index .Alerts 0).Labels.cluster_type }},{{ (index .Alerts 0).Labels.severity }},{{ (index .Alerts 0).Labels.team }},{{ (index .Alerts 0).Labels.area }},{{ (index .Alerts 0).Labels.service_priority }},{{ (index .Alerts 0).Labels.provider }},{{ (index .Alerts 0).Labels.installation }},{{ (index .Alerts 0).Labels.pipeline }},{{ (index .Alerts 0).Labels.customer }}"
---
>     tags: {{`{{ (index .Alerts 0).Labels.alertname }},{{ (index .Alerts 0).Labels.cluster_type }},{{ (index .Alerts 0).Labels.severity }},{{ (index .Alerts 0).Labels.team }},{{ (index .Alerts 0).Labels.area }},{{ (index .Alerts 0).Labels.service_priority }},{{ (index .Alerts 0).Labels.provider }},{{ (index .Alerts 0).Labels.installation }},{{ (index .Alerts 0).Labels.pipeline }},{{ (index .Alerts 0).Labels.customer }}`}}
Notification template
wget https://raw.githubusercontent.com/giantswarm/prometheus-meta-operator/refs/heads/main/files/templates/alertmanager/notification-template.tmpl
patch notification-template.tmpl < <following patch>
3,21d2
< {{ define "__alerturl" }}
< [[- if .MimirEnabled -]]
< [[ .GrafanaAddress ]]/alerting/Mimir/{{ .CommonLabels.alertname }}/find
< [[- else -]]
< {{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}&silenced=false&inhibited=false&active=true&filter=%7Balertname%3D%22{{ .CommonLabels.alertname }}%22%7D
< [[- end -]]
< {{ end }}
<
< {{ define "__dashboardurl" -}}{{ if match "^https://.+" (index .Alerts 0).Annotations.dashboard }}{{ (index .Alerts 0).Annotations.dashboard }}{{ else }}[[ .GrafanaAddress ]]/d/{{ (index .Alerts 0).Annotations.dashboard }}{{ end }}{{- end }}
< {{ define "__runbookurl" -}}https://intranet.giantswarm.io/docs/support-and-ops/ops-recipes/{{ (index .Alerts 0).Annotations.opsrecipe }}{{- end }}
<
< {{ define "__queryurl" }}
< [[- if .MimirEnabled -]]
< [[ .GrafanaAddress ]]/alerting/Mimir/{{ .CommonLabels.alertname }}/find
< [[- else -]]
< {{ (index .Alerts 0).GeneratorURL }}
< [[- end -]]
< {{ end }}
<
59d39
< [[- if .MimirEnabled ]]
61,64d40
< [[- else ]]
< 🔔 Alertmanager {{ template "__alerturl" . }}
< 👀 Query: {{ template "__queryurl" . }}
< [[- end ]]

I would like some opinions before I continue in this direction, because I feel there are a lot of workaround here to get this config into a Secret and it could be easier to have it directly in code. Also does anyone remember what the ProxyURL is used for ? It seems to be OpsGenie related but I gladly ignored it and I have a feeling this could be important.

@TheoBrigitte TheoBrigitte requested a review from a team as a code owner December 10, 2024 09:05
@TheoBrigitte TheoBrigitte self-assigned this Dec 10, 2024
@TheoBrigitte TheoBrigitte changed the base branch from main to alertmanager-config December 10, 2024 09:06
http_config:
proxy_url: {{ .alertmanager.proxyURL }}
{{- end }}
{{- if .alertmanager.slackAPIToken }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a CAPA installation where we do not use slackAPIToken?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not, but it doesn't hurt to keep it for now, I'd rather be safe.

send_resolved: true
actions: *slack-actions

- name: team_turtles_slack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe should go away

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

@@ -0,0 +1,18 @@
{{`
{{ define "__alerturl" }}
`}}{{ .alertmanager.grafanaAddress }}{{`/alerting/Mimir/{{ .CommonLabels.alertname }}/find
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to replace this with the grafanaExploreURL or this queryFromGeneratorURL instead https://grafana.com/docs/mimir/latest/references/architecture/components/alertmanager/#templating right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'll look into this later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to see if we can get rid of this whole file. that would make things simpler imo


# Link to related PMs
{{ define "__alert_linked_postmortems" -}}
https://github.com/giantswarm/giantswarm/issues?q=is%3Aissue+is%3Aopen+label%3Apostmortem+label%3Aalert%2F{{ .CommonLabels.alertname }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark but that's not that urgent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used as a link on Slack

@TheoBrigitte TheoBrigitte force-pushed the alertmanager-config-helm branch from 4b4c161 to 11abd5f Compare December 10, 2024 15:18
@TheoBrigitte TheoBrigitte changed the base branch from alertmanager-config to main December 10, 2024 17:45
TheoBrigitte and others added 8 commits December 10, 2024 18:48
- Add secret resource, embedding raw and templated alertmanager files
- Expose alertmanager templates values as helm chart values
- Remove all Mimir related conditions
- Split template into url and notification templates
- Drop template directive, dynamically set by the operator
- Escape template in template
- Re-use slack actions
This fixes the infamous: error calling tpl: cannot retrieve Template.Basepath from values inside tpl function

It does use .Values in templates to access values and pass $ root context to tpl
Co-authored-by: Quentin Bisson <[email protected]>
@TheoBrigitte TheoBrigitte force-pushed the alertmanager-config-helm branch from 9d0c548 to 5d320b9 Compare December 10, 2024 17:50
alerting:
alertmanagerURL: ""
grafanaAddress: ""
proxyURL: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxy url should be coming from an env variable right?

resolve_timeout: 5m
{{- if .Values.alerting.proxyURL }}
http_config:
proxy_url: {{ .Values.alerting.proxyURL }}
Copy link
Contributor

@QuentinBisson QuentinBisson Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not useful anymore with the env variables automatically set on all pods. Do you think you could test without it on goat? Because I'm not sure we can get this value anyway in config as it is set by kyverno policies

@QuentinBisson
Copy link
Contributor

Apart from my proxy url questions, I'm fine with this for now but I'd rather we remove the url mumbo jumbo later on

* Team: {{ (index .Alerts 0).Labels.team }}
* Area: {{ (index .Alerts 0).Labels.area }} / {{ (index .Alerts 0).Labels.topic }}
* Instances:{{ range .Alerts.Firing }}
🔥 {{ if .Labels.instance }}{{ .Labels.instance }}: {{ end }}{{ .Annotations.description }}{{ end }}
{{- end }}

# This builds the silence URL. We exclude the alertname in the range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to remove this silence URL right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants