Skip to content

Grafana and Prometheus Monitoring Setup

Ravencodess edited this page Jul 31, 2024 · 17 revisions

Grafana and Prometheus Monitoring Setup

Overview

This documentation provides a comprehensive guide on setting up Grafana and Prometheus for monitoring. It covers installation steps, configuration details, instructions for accessing and managing dashboards, and troubleshooting tips for maintaining the monitoring setup.

Objectives

  • Install and Configure Prometheus and Grafana:

  • Set up Prometheus for metric collection and Grafana for data visualization.

  • Create and Configure Grafana Dashboards:

  • Develop dashboards to visualize metrics.

  • Set Up Alerts Based on Collected Metrics:

  • Configure alerting to notify team of potential issues.

  • Ensure Proper Data Retention and Access Control:

  • Manage data storage and user access.

Table of Contents

  1. Installing Prometheus and Grafana
  2. Configuring the Monitoring Dashboards
  3. Configuring Alerting
  4. Data Retention and Access Control
  5. Manage Users and Roles in Grafana

Installing Prometheus and Grafana

#!/bin/bash

# Exit immediately if a command exits with a non-zero status
set -e

# Update and install dependencies
apt-get update
apt-get install -y wget curl tar adduser libfontconfig1

# Create users
sudo adduser --system --group --no-create-home prometheus
sudo adduser --system --group --no-create-home node_exporter
sudo adduser --system --group --home /var/lib/grafana grafana

# Install Prometheus
PROMETHEUS_VERSION="2.37.0"
wget https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz
tar xvf prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz
cd prometheus-${PROMETHEUS_VERSION}.linux-amd64/
sudo mv prometheus promtool /usr/local/bin/
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo mv consoles/ console_libraries/ prometheus.yml /etc/prometheus/
cd ..
rm -rf prometheus-${PROMETHEUS_VERSION}.linux-amd64*

# Set Prometheus ownership
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

# Install Grafana
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana

# Install Node Exporter
NODE_EXPORTER_VERSION="1.3.1"
wget https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz
tar xvf node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz
sudo mv node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64/node_exporter /usr/local/bin/
rm -rf node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64*

# Install cAdvisor (latest version as of the last update)
CADVISOR_VERSION="v0.47.0"  # Update this to the latest version if needed
sudo apt-get install -y libseccomp2
wget https://github.com/google/cadvisor/releases/download/${CADVISOR_VERSION}/cadvisor-${CADVISOR_VERSION}-linux-amd64
sudo mv cadvisor-${CADVISOR_VERSION}-linux-amd64 /usr/local/bin/cadvisor
sudo chmod +x /usr/local/bin/cadvisor

# Create systemd service files

# Prometheus
cat << EOF | sudo tee /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
EOF

# Node Exporter
cat << EOF | sudo tee /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

# cAdvisor
cat << EOF | sudo tee /etc/systemd/system/cadvisor.service
[Unit]
Description=cAdvisor
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/cadvisor

[Install]
WantedBy=multi-user.target
EOF

# Reload systemd and start services
sudo systemctl daemon-reload
sudo systemctl enable prometheus node_exporter cadvisor grafana-server
sudo systemctl start prometheus node_exporter cadvisor grafana-server

echo "Installation complete. Please check the status of the services to ensure they are running correctly."

Configure Grafana to Use Prometheus as a Data Source:

  • Access Grafana via your web browser (http://localhost:3000/). Since we have an application running on port 3000, we configured Grafana to use 3050 instead.
  • Navigate to Configuration > Data Sources > Add data source.
  • Select Prometheus and set the URL to http://localhost:9090. Click Save & Test.

Import Node Exporter Dashboard:

  • In Grafana, go to the Dashboards page.
  • Click the “+” icon and select “Import”.
  • Enter Dashboard ID 1860 and click “Load”.
  • Choose the Prometheus data source and click “Import”.

Setting up an Application Metrics Dashboard

Using cAdvisor:

  • In Grafana, click the “+” icon and select “Dashboard”.
  • Click “import” and the add the dashboard json or ID.

image (1)

Configuring Dynamic Variables:

  • Navigate to the desired dashboard, click on the settings icon, select the variables tab, and create a new dynamic variable.
  • In this case, we are creating a dynamic variable to reflect the environments of our containerized applications, such as dev, staging, and prod. image

Testing the Dynamic Variable:

  • From the image below, our dynamic variable lists the containers based on the environments. image

Configuring Alerting

Configuring Grafana Alerts

Create and Configure Alerts in Grafana:

  • Open the home menu bar and.
  • Navigate to the Alert tab and click "Create Alert."
  • Define conditions based on Prometheus queries (e.g., rate(myapp_request_count[1m]) > 100).

Configure Alert Evaluation and Frequency:

  • Set the evaluation interval (e.g., every minute) and the duration for which the alert condition must be met (e.g., 5 minutes).

Set Up Notification Channels:

  • Navigate to Alerting > Contact Points > Click Add contact point button.
  • Configure the notification channel (e.g., Slack) with your webhook URL and other details.

Link Alerts to Notification Channels:

  • In the alert configuration, associate the alert with the notification channel created.

Managed Alerts and Notifications:

  • We configured Grafana managed alerts to notify the team via our Slack configuration webhook URL, which sends the alerts to the #devops-alerts channel.
  • Alerts were created for Low Disk, High CPU, Container Down, and High Memory. contact alerts

Set Up Data Retention Policy

Set Data Retention in Prometheus:

  • By default, Prometheus has a data retention period of 15 days. We specified our data retention period to 30 days and set the maximum storage size to 5GB. This was achieved by adding the following flags: --storage.tsdb.retention.time=30d --storage.tsdb.retention.size=5GB. The updated service file looks like this:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=5GB
Restart=always

[Install]
WantedBy=multi-user.target

30 07 2024_02 10 13_rec_720

30 07 2024_02 05 56_rec_720

Manage User Roles and Permissions in Grafana

User Roles and Permissions in Grafana:

  • Log in to Grafana and navigate to the home menu bar. Expand the Administration section to create a team or add users.
  • We created a team called FE DevOps and added our team members to the team.
  • Team leads were given admin access, while other members were assigned viewer access.

image