Skip to content

System Monitoring with Prometheus and Grafana

Ekemini Udongwo edited this page Jul 29, 2024 · 4 revisions

This guide covers the complete setup of a monitoring system using Prometheus, Grafana, and Node Exporter.

Table of Contents

  1. Install Prometheus
  2. Install Grafana
  3. Install and Configure Node Exporter
  4. Creating a Custom Dashboard
  5. Grafana Alerts

INSTALLATION

1. Install Prometheus

1.1 Download and Install Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.37.0/prometheus-2.37.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
cd prometheus-*/
sudo mv prometheus /usr/local/bin/
sudo mv promtool /usr/local/bin/
sudo mkdir /etc/prometheus
sudo mv prometheus.yml /etc/prometheus/
cd ..
rm -rf prometheus-*

Create Prometheus systemd Service

sudo tee /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --storage.tsdb.retention.size 1GB \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=default.target
EOF

NOTE:

The command:

  --storage.tsdb.path /var/lib/prometheus/ 
  --storage.tsdb.retention.size 1GB

is responsible for setting up the storage and data retention.

Create Prometheus User and Directories

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

Start Prometheus Service

sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus

2. Install Grafana

2.1 Add Grafana Repository and Install

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana

2.2 Start Grafana Service

sudo systemctl start grafana-server
sudo systemctl enable grafana-server

3. Install and Configure Node Exporter

3.1 Download and Install Node Exporter

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-1.3.1.linux-amd64.tar.gz
sudo mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/
rm -rf node_exporter-1.3.1.linux-amd64*

3.2 Create Node Exporter systemd Service

sudo tee /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

3.3 Create Node Exporter User

sudo useradd -rs /bin/false node_exporter

3.4 Start Node Exporter Service

sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

3.5 Configure Prometheus to Scrape Node Exporter

Edit bash /etc/prometheus/prometheus.yml and add:

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

Restart Prometheus:

sudo systemctl restart prometheus

4. DASHBOARD CONFIG

  • Open Grafana on port :3000
  • Navigate to dashboards and view dashboard matrics with node explorer

ACCESS CONTROL

  • A guest user account is set up to give access to guests, with permission only to view the dashboard and other matrics.

5. GRAFANA ALERTS

Configuring Grafana Alerting

  1. Log in to your Grafana instance as an admin.
  2. Navigate to Configuration > Alerting.
  3. Enable alerting if not already enabled.

Creating Alert Rules

  1. Go to the dashboard where you want to add an alert.
  2. Edit the panel for which you want to create an alert.
  3. Click on the "Alert" tab.
  4. Configure the alert conditions:
    • Set the evaluation frequency
    • Define the conditions that trigger the alert
    • Specify the duration for which the condition must be true
  5. Add a notification channel (we'll set up Slack in the next section).
  6. Save the alert rule.

Setting Up Slack Notifications

  • In Grafana, go to Configuration > Notification channels.
  • Click "Add channel".
  • Select "Slack" as the type.
  • Set up a Slack webhook:
Go to https://api.slack.com/apps
Create a new app or select an existing one
Enable "Incoming Webhooks"
Create a new webhook URL for your desired Slack channel
  • In Grafana, paste the webhook URL into the "Webhook URL" field.
  • Configure other options as needed (channel, username, icon).
  • Click "Send Test" to verify the connection.
  • Save the notification channel.

Creating Notification Templates Grafana allows customization of alert notifications. Here's an example template:

Use the following template:

{{ define "slack.print_alert" -}}
[{{.Status}}] {{ .Labels.alertname }}
{{ range .Labels.SortedPairs -}}
  {{ if eq .Name "alertname" -}}
    Title: {{ .Value }}
  {{ end -}}
{{ end -}}

{{ if .Annotations -}}
{{ range .Annotations.SortedPairs -}}
  {{ if eq .Name "summary" -}}
    Message: {{ .Value }}
  {{ end -}}
{{ end -}}
{{ end -}}

{{- end }}

{{ define "slack.message" -}}
{{ if .Alerts.Firing -}}
{{ len .Alerts.Firing }} firing alert(s):
{{ range .Alerts.Firing }}
{{ template "slack.print_alert" . }}
{{ end -}}
{{ end }}
{{ if .Alerts.Resolved -}}
{{ len .Alerts.Resolved }} resolved alert(s):
{{ range .Alerts.Resolved }}
{{ template "slack.print_alert" .}}
{{ end -}}
{{ end }}
{{- end }}

and payload:

[
  {
    "status": "firing",
    "annotations": {
      "summary": "Instance instance1 has been down for more than 5 minutes"
    },
    "labels": {
      "alertname": "InstanceDown",
      "instance": "instance1"
    },
    "startsAt": "2024-07-28T14:27:25.734Z",
    "endsAt": "2024-07-29T14:32:25.734Z",
    "fingerprint": "a5331f0d5a9d81d4",
    "generatorURL": "http://grafana.com/alerting/grafana/cdeqmlhvflz40f/view"
  },
  {
    "status": "resolved",
    "annotations": {
      "summary": "CPU usage above 90%"
    },
    "labels": {
      "alertname": "CpuUsage",
      "instance": "instance1"
    },
    "startsAt": "2024-07-29T10:27:25.734Z",
    "endsAt": "2024-07-29T14:27:25.734Z",
    "fingerprint": "b77d941310f9d381",
    "generatorURL": "http://grafana.com/alerting/grafana/oZSMdGj7z/view"
  }
]

This template includes:

  • Alert status (firing or resolved)
  • Alert name
  • Summary and description
  • Additional labels