Skip to content

Grafana and Prometheus Monitoring Setup

Jessica Chioma edited this page Jul 30, 2024 · 17 revisions

Grafana and Prometheus Monitoring Setup

Overview

This documentation provides a comprehensive guide on setting up Grafana and Prometheus for monitoring. It covers installation steps, configuration details, instructions for accessing and managing dashboards, and troubleshooting tips for maintaining the monitoring setup.

Objectives

  • Install and Configure Prometheus and Grafana:

    Set up Prometheus for metric collection and Grafana for data visualization.

  • Create and Configure Grafana Dashboards:

    Develop dashboards to visualize metrics.

  • Set Up Alerts Based on Collected Metrics:

    Configure alerting to notify team of potential issues.

  • Ensure Proper Data Retention and Access Control:

    Manage data storage and user access.

Table of Contents

  1. Installing Prometheus and Grafana
  2. Configuring the Monitoring Dashboards
  3. Configuring Alerting
  4. Data Retention and Access Control

Installing Prometheus and Grafana

Prometheus Installation

  1. Download Prometheus: Obtain the latest version from the Prometheus website.
  2. Extract the Archive:
tar xvfz prometheus-*.tar.gz
  1. Move to Installation Directory:
sudo mv prometheus-* /usr/local/prometheus
  1. Create a Prometheus User:
sudo useradd --no-create-home --shell /bin/false prometheus
  1. Set Up Directories and Permissions:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
  1. Copy Configuration Files:
sudo cp prometheus.yml /etc/prometheus/
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
  1. Create a Systemd Service File:
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/prometheus/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/

[Install]
WantedBy=multi-user.target
  1. Enable and Start Prometheus:
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus

Grafana Installation

Download and Install Grafana:

sudo apt-get install -y apt-transport-https software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Create a Grafana User:

sudo useradd --no-create-home --shell /bin/false grafana

Set Up Directories and Permissions:

sudo chown grafana:grafana /usr/local/grafana

Create a Systemd Service File:

#!/bin/bash
# Check if script is run as root
if [ "$EUID" -ne 0 ]
  then echo "Please run as root"
  exit
fi
# Create systemd service file for Prometheus
cat > /etc/systemd/system/prometheus.service <<EOL
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
EOL

Enable and Start Grafana:

sudo systemctl daemon-reload
sudo systemctl enable grafana
sudo systemctl start grafana

Configuring the Monitoring Dashboards

Setting up a Server Metrics Dashboard

Install Node Exporter:

Node Exporter is essential for collecting server metrics such as CPU usage, memory usage, disk I/O, and network traffic.

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-*.tar.gz
cd node_exporter-*
sudo mv node_exporter /usr/local/bin/

Create a Systemd Service File:

ini


[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Enable and Start Node Exporter:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

cAdvisor Installation file

VERSION=v0.47.0  # This was the latest stable version as of my last update
wget https://github.com/google/cadvisor/releases/download/$VERSION/cadvisor-$VERSION-linux-amd64
chmod +x cadvisor-$VERSION-linux-amd64
sudo mv cadvisor-$VERSION-linux-amd64 /usr/local/bin/cadvisor

cAdvisor (/etc/systemd/system/cadvisor.service):

ini

[Unit]
Description=cAdvisor
Wants=network-online.target
After=network-online.target

[Service]
User=cadvisor
Group=cadvisor
Type=simple
ExecStart=/usr/local/bin/cadvisor

[Install]
WantedBy=multi-user.target

After creating these files, run the following commands:

sudo systemctl daemon-reload
sudo systemctl enable prometheus node_exporter cadvisor alertmanager
sudo systemctl start prometheus node_exporter cadvisor alertmanager

Note: You'll need to create the appropriate users and groups (prometheus, node_exporter, cadvisor, alertmanager) and ensure proper permissions for directories and files.

#!/bin/bash
# Check if script is run as root
if [ "$EUID" -ne 0 ]
  then echo "Please run as root"
  exit
fi
# Create systemd service file for Prometheus
cat > /etc/systemd/system/prometheus.service <<EOL
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for Node Exporter
cat > /etc/systemd/system/node_exporter.service <<EOL
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for cAdvisor
cat > /etc/systemd/system/cadvisor.service <<EOL
[Unit]
Description=cAdvisor
Wants=network-online.target
After=network-online.target
[Service]
User=cadvisor
Group=cadvisor
Type=simple
ExecStart=/usr/local/bin/cadvisor
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for Alertmanager
cat > /etc/systemd/system/alertmanager.service <<EOL
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager
[Install]
WantedBy=multi-user.target
EOL
# Reload systemd to recognize new service files
systemctl daemon-reload
# Enable and start services
systemctl enable prometheus node_exporter cadvisor alertmanager
systemctl start prometheus node_exporter cadvisor alertmanager
echo "Systemd service files have been created and services have been enabled and started."
echo "Please ensure that you have created the necessary users and groups,"
echo "and that the binary files and directories exist with proper permissions."

Configure Grafana to Use Prometheus as a Data Source:

  • Access Grafana via your web browser (http://localhost:3000/). Since we have an application running on port 3000, we configured Grafana to use 3050 instead.
  • Navigate to Configuration > Data Sources > Add data source.
  • Select Prometheus and set the URL to http://localhost:9090. Click Save & Test.

Import Node Exporter Dashboard:

  • In Grafana, go to the Dashboards page.
  • Click the “+” icon and select “Import”.
  • Enter Dashboard ID 1860 and click “Load”.
  • Choose the Prometheus data source and click “Import”.

Setting up an Application Metrics Dashboard

Using cAdvisor:

  • In Grafana, click the “+” icon and select “Dashboard”.
  • Click “import” and the add the dashboard json or ID.

image (1)

Configuring Dynamic Variables:

  • Go to the dashboard you want to configure dynamic variables, click the settings icon, click the variables tab, and create the dynamic variable.
  • We chose to make a dynamic variable based on the environments of our containerized applications. e.g.(dev, staging, prod).

image (4)

image (5)

Testing the Dynamic Variable:

  • From the image below, our dynamic variable lists the containers based on the environments.

image (6)

Configuring Alerting

Configuring Grafana Alerts

Create and Configure Alerts in Grafana:

  • Open the desired panel and click the Edit button.
  • Navigate to the Alert tab and click “Create Alert”.
  • Define conditions based on Prometheus queries (e.g., rate(myapp_request_count[1m]) > 100).

Configure Alert Evaluation and Frequency:

  • Set the evaluation interval (e.g., every minute) and the duration for which the alert condition must be met (e.g., 5 minutes).

Set Up Notification Channels:

  • Navigate to Alerting > Notification channels > New Channel.
  • Configure the notification channel (e.g., Slack) with your webhook URL.

Link Alerts to Notification Channels:

  • In the alert configuration, associate the alert with the notification channel created.
  • Data Retention and Access Control
  • Configuring Data Retention

Set Up Data Retention Policy

Set Data Retention in Prometheus:

  • Specify the data retention period in the Prometheus configuration or startup parameters (e.g., --storage.tsdb.retention.time=15d).

Update Prometheus Service File:

  • Ensure that the retention flag is included in the Prometheus service configuration.
  • Configuring Access Control

Manage User Roles and Permissions in Grafana:

  • Log in to Grafana and navigate