-
Notifications
You must be signed in to change notification settings - Fork 106
System Monitoring with Prometheus and Grafana
Ekemini Udongwo edited this page Jul 29, 2024
·
4 revisions
This guide covers the complete setup of a monitoring system using Prometheus, Grafana, and Node Exporter.
- Install Prometheus
- Install Grafana
- Install and Configure Node Exporter
- Creating a Custom Dashboard
- Grafana Alerts
wget https://github.com/prometheus/prometheus/releases/download/v2.37.0/prometheus-2.37.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
cd prometheus-*/
sudo mv prometheus /usr/local/bin/
sudo mv promtool /usr/local/bin/
sudo mkdir /etc/prometheus
sudo mv prometheus.yml /etc/prometheus/
cd ..
rm -rf prometheus-*
sudo tee /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--storage.tsdb.retention.size 1GB \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=default.target
EOF
The command:
--storage.tsdb.path /var/lib/prometheus/
--storage.tsdb.retention.size 1GB
is responsible for setting up the storage and data retention.
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-1.3.1.linux-amd64.tar.gz
sudo mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/
rm -rf node_exporter-1.3.1.linux-amd64*
sudo tee /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOF
sudo useradd -rs /bin/false node_exporter
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
Edit bash /etc/prometheus/prometheus.yml
and add:
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
Restart Prometheus:
sudo systemctl restart prometheus
- Open Grafana on port :3000
- Navigate to dashboards and view dashboard matrics with node explorer
- A guest user account is set up to give access to guests, with permission only to view the dashboard and other matrics.
- Log in to your Grafana instance as an admin.
- Navigate to Configuration > Alerting.
- Enable alerting if not already enabled.
- Go to the dashboard where you want to add an alert.
- Edit the panel for which you want to create an alert.
- Click on the "Alert" tab.
- Configure the alert conditions:
- Set the evaluation frequency
- Define the conditions that trigger the alert
- Specify the duration for which the condition must be true
- Add a notification channel (we'll set up Slack in the next section).
- Save the alert rule.
- In Grafana, go to Configuration > Notification channels.
- Click "Add channel".
- Select "Slack" as the type.
- Set up a Slack webhook:
Go to https://api.slack.com/apps
Create a new app or select an existing one
Enable "Incoming Webhooks"
Create a new webhook URL for your desired Slack channel
- In Grafana, paste the webhook URL into the "Webhook URL" field.
- Configure other options as needed (channel, username, icon).
- Click "Send Test" to verify the connection.
- Save the notification channel.
Creating Notification Templates Grafana allows customization of alert notifications. Here's an example template:
{{ define "slack.print_alert" -}}
[{{.Status}}] {{ .Labels.alertname }}
{{ range .Labels.SortedPairs -}}
{{ if eq .Name "alertname" -}}
Title: {{ .Value }}
{{ end -}}
{{ end -}}
{{ if .Annotations -}}
{{ range .Annotations.SortedPairs -}}
{{ if eq .Name "summary" -}}
Message: {{ .Value }}
{{ end -}}
{{ end -}}
{{ end -}}
{{- end }}
{{ define "slack.message" -}}
{{ if .Alerts.Firing -}}
{{ len .Alerts.Firing }} firing alert(s):
{{ range .Alerts.Firing }}
{{ template "slack.print_alert" . }}
{{ end -}}
{{ end }}
{{ if .Alerts.Resolved -}}
{{ len .Alerts.Resolved }} resolved alert(s):
{{ range .Alerts.Resolved }}
{{ template "slack.print_alert" .}}
{{ end -}}
{{ end }}
{{- end }}
[
{
"status": "firing",
"annotations": {
"summary": "Instance instance1 has been down for more than 5 minutes"
},
"labels": {
"alertname": "InstanceDown",
"instance": "instance1"
},
"startsAt": "2024-07-28T14:27:25.734Z",
"endsAt": "2024-07-29T14:32:25.734Z",
"fingerprint": "a5331f0d5a9d81d4",
"generatorURL": "http://grafana.com/alerting/grafana/cdeqmlhvflz40f/view"
},
{
"status": "resolved",
"annotations": {
"summary": "CPU usage above 90%"
},
"labels": {
"alertname": "CpuUsage",
"instance": "instance1"
},
"startsAt": "2024-07-29T10:27:25.734Z",
"endsAt": "2024-07-29T14:27:25.734Z",
"fingerprint": "b77d941310f9d381",
"generatorURL": "http://grafana.com/alerting/grafana/oZSMdGj7z/view"
}
]
- Alert status (firing or resolved)
- Alert name
- Summary and description
- Additional labels