Skip to content

Grafana and Prometheus Monitoring Setup

Jessica Chioma edited this page Jul 30, 2024 · 17 revisions

Grafana and Prometheus Monitoring Setup

Overview

This documentation provides a comprehensive guide on setting up Grafana and Prometheus for monitoring. It covers installation steps, configuration details, instructions for accessing and managing dashboards, and troubleshooting tips for maintaining the monitoring setup.

Objectives

  • Install and Configure Prometheus and Grafana:

  • Set up Prometheus for metric collection and Grafana for data visualization.

  • Create and Configure Grafana Dashboards:

  • Develop dashboards to visualize metrics.

  • Set Up Alerts Based on Collected Metrics:

  • Configure alerting to notify team of potential issues.

  • Ensure Proper Data Retention and Access Control:

  • Manage data storage and user access.

Table of Contents

  1. Installing Prometheus and Grafana
  2. Configuring the Monitoring Dashboards
  3. Configuring Alerting
  4. Data Retention and Access Control
  5. Manage Users and Roles in Grafana

Installing Prometheus and Grafana

Prometheus Installation

  1. Download Prometheus: Obtain the latest version from the Prometheus website.
  2. Extract the Archive:
tar xvfz prometheus-*.tar.gz
  1. Move to Installation Directory:
sudo mv prometheus-* /usr/local/prometheus
  1. Create a Prometheus User:
sudo useradd --no-create-home --shell /bin/false prometheus
  1. Set Up Directories and Permissions:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
  1. Copy Configuration Files:
sudo cp prometheus.yml /etc/prometheus/
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
  1. Create a Systemd Service File:
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/prometheus/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/

[Install]
WantedBy=multi-user.target
  1. Enable and Start Prometheus:
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus

Grafana Installation

Download and Install Grafana:

sudo apt-get install -y apt-transport-https software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Create a Grafana User:

sudo useradd --no-create-home --shell /bin/false grafana

Set Up Directories and Permissions:

sudo chown grafana:grafana /usr/local/grafana

Create a Systemd Service File:

#!/bin/bash
# Check if script is run as root
if [ "$EUID" -ne 0 ]
  then echo "Please run as root"
  exit
fi
# Create systemd service file for Prometheus
cat > /etc/systemd/system/prometheus.service <<EOL
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
EOL

Enable and Start Grafana:

sudo systemctl daemon-reload
sudo systemctl enable grafana
sudo systemctl start grafana

Configuring the Monitoring Dashboards

Setting up a Server Metrics Dashboard

Install Node Exporter:

Node Exporter is essential for collecting server metrics such as CPU usage, memory usage, disk I/O, and network traffic.

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-*.tar.gz
cd node_exporter-*
sudo mv node_exporter /usr/local/bin/

Create a Systemd Service File:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Enable and Start Node Exporter:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

cAdvisor Installation file

VERSION=v0.47.0  # This was the latest stable version as of my last update
wget https://github.com/google/cadvisor/releases/download/$VERSION/cadvisor-$VERSION-linux-amd64
chmod +x cadvisor-$VERSION-linux-amd64
sudo mv cadvisor-$VERSION-linux-amd64 /usr/local/bin/cadvisor

cAdvisor (/etc/systemd/system/cadvisor.service):

ini

[Unit]
Description=cAdvisor
Wants=network-online.target
After=network-online.target

[Service]
User=cadvisor
Group=cadvisor
Type=simple
ExecStart=/usr/local/bin/cadvisor

[Install]
WantedBy=multi-user.target

After creating these files, run the following commands:

sudo systemctl daemon-reload
sudo systemctl enable prometheus node_exporter cadvisor alertmanager
sudo systemctl start prometheus node_exporter cadvisor alertmanager

Note: You'll need to create the appropriate users and groups (prometheus, node_exporter, cadvisor, alertmanager) and ensure proper permissions for directories and files.

#!/bin/bash
# Check if script is run as root
if [ "$EUID" -ne 0 ]
  then echo "Please run as root"
  exit
fi
# Create systemd service file for Prometheus
cat > /etc/systemd/system/prometheus.service <<EOL
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for Node Exporter
cat > /etc/systemd/system/node_exporter.service <<EOL
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for cAdvisor
cat > /etc/systemd/system/cadvisor.service <<EOL
[Unit]
Description=cAdvisor
Wants=network-online.target
After=network-online.target
[Service]
User=cadvisor
Group=cadvisor
Type=simple
ExecStart=/usr/local/bin/cadvisor
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for Alertmanager
cat > /etc/systemd/system/alertmanager.service <<EOL
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager
[Install]
WantedBy=multi-user.target
EOL
# Reload systemd to recognize new service files
systemctl daemon-reload
# Enable and start services
systemctl enable prometheus node_exporter cadvisor alertmanager
systemctl start prometheus node_exporter cadvisor alertmanager
echo "Systemd service files have been created and services have been enabled and started."
echo "Please ensure that you have created the necessary users and groups,"
echo "and that the binary files and directories exist with proper permissions."

Configure Grafana to Use Prometheus as a Data Source:

  • Access Grafana via your web browser (http://localhost:3000/). Since we have an application running on port 3000, we configured Grafana to use 3050 instead.
  • Navigate to Configuration > Data Sources > Add data source.
  • Select Prometheus and set the URL to http://localhost:9090. Click Save & Test.

Import Node Exporter Dashboard:

  • In Grafana, go to the Dashboards page.
  • Click the “+” icon and select “Import”.
  • Enter Dashboard ID 1860 and click “Load”.
  • Choose the Prometheus data source and click “Import”.

Setting up an Application Metrics Dashboard

Using cAdvisor:

  • In Grafana, click the “+” icon and select “Dashboard”.
  • Click “import” and the add the dashboard json or ID.

image (1)

Configuring Dynamic Variables:

  • Navigate to the desired dashboard, click on the settings icon, select the variables tab, and create a new dynamic variable.
  • In this case, we are creating a dynamic variable to reflect the environments of our containerized applications, such as dev, staging, and prod.

image

Testing the Dynamic Variable:

  • From the image below, our dynamic variable lists the containers based on the environments. image

Configuring Alerting

Configuring Grafana Alerts

Create and Configure Alerts in Grafana:

  • Open the home menu bar and.
  • Navigate to the Alert tab and click "Create Alert."
  • Define conditions based on Prometheus queries (e.g., rate(myapp_request_count[1m]) > 100).

Configure Alert Evaluation and Frequency:

  • Set the evaluation interval (e.g., every minute) and the duration for which the alert condition must be met (e.g., 5 minutes).

Set Up Notification Channels:

  • Navigate to Alerting > Contact Points > Click Add contact point button.
  • Configure the notification channel (e.g., Slack) with your webhook URL and other details.

Link Alerts to Notification Channels:

  • In the alert configuration, associate the alert with the notification channel created.

Managed Alerts and Notifications:

  • We configured Grafana managed alerts to notify the team via our Slack configuration webhook URL, which sends the alerts to the #devops-alerts channel.
  • Alerts were created for Low Disk, High CPU, Container Down, and High Memory. contact alerts

Set Up Data Retention Policy

Set Data Retention in Prometheus:

  • By default, Prometheus has a data retention period of 15 days. We specified our data retention period to 30 days and set the maximum storage size to 5GB. This was achieved by adding the following flags: --storage.tsdb.retention.time=30d --storage.tsdb.retention.size=5GB. The updated service file looks like this:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=5GB
Restart=always

[Install]
WantedBy=multi-user.target

30 07 2024_02 10 13_rec_720

30 07 2024_02 05 56_rec_720

Manage User Roles and Permissions in Grafana

User Roles and Permissions in Grafana:

  • Log in to Grafana and navigate to the home menu bar. Expand the Administration section to create a team or add users.
  • We created a team called FE DevOps and added our team members to the team.
  • Team leads were given admin access, while other members were assigned viewer access.

image