-
Notifications
You must be signed in to change notification settings - Fork 265
Grafana and Prometheus Monitoring Setup
This documentation provides a comprehensive guide on setting up Grafana and Prometheus for monitoring. It covers installation steps, configuration details, instructions for accessing and managing dashboards, and troubleshooting tips for maintaining the monitoring setup.
-
Install and Configure Prometheus and Grafana:
-
Set up Prometheus for metric collection and Grafana for data visualization.
-
Create and Configure Grafana Dashboards:
-
Develop dashboards to visualize metrics.
-
Set Up Alerts Based on Collected Metrics:
-
Configure alerting to notify team of potential issues.
-
Ensure Proper Data Retention and Access Control:
-
Manage data storage and user access.
- Installing Prometheus and Grafana
- Configuring the Monitoring Dashboards
- Configuring Alerting
- Data Retention and Access Control
- Manage Users and Roles in Grafana
- Download Prometheus: Obtain the latest version from the Prometheus website.
- Extract the Archive:
tar xvfz prometheus-*.tar.gz
- Move to Installation Directory:
sudo mv prometheus-* /usr/local/prometheus
- Create a Prometheus User:
sudo useradd --no-create-home --shell /bin/false prometheus
- Set Up Directories and Permissions:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
- Copy Configuration Files:
sudo cp prometheus.yml /etc/prometheus/
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
- Create a Systemd Service File:
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/prometheus/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/
[Install]
WantedBy=multi-user.target
- Enable and Start Prometheus:
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo apt-get install -y apt-transport-https software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Create a Grafana User:
sudo useradd --no-create-home --shell /bin/false grafana
Set Up Directories and Permissions:
sudo chown grafana:grafana /usr/local/grafana
Create a Systemd Service File:
#!/bin/bash
# Check if script is run as root
if [ "$EUID" -ne 0 ]
then echo "Please run as root"
exit
fi
# Create systemd service file for Prometheus
cat > /etc/systemd/system/prometheus.service <<EOL
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
EOL
Enable and Start Grafana:
sudo systemctl daemon-reload
sudo systemctl enable grafana
sudo systemctl start grafana
Node Exporter is essential for collecting server metrics such as CPU usage, memory usage, disk I/O, and network traffic.
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-*.tar.gz
cd node_exporter-*
sudo mv node_exporter /usr/local/bin/
Create a Systemd Service File:
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
Enable and Start Node Exporter:
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
cAdvisor Installation file
VERSION=v0.47.0 # This was the latest stable version as of my last update
wget https://github.com/google/cadvisor/releases/download/$VERSION/cadvisor-$VERSION-linux-amd64
chmod +x cadvisor-$VERSION-linux-amd64
sudo mv cadvisor-$VERSION-linux-amd64 /usr/local/bin/cadvisor
cAdvisor (/etc/systemd/system/cadvisor.service):
ini
[Unit]
Description=cAdvisor
Wants=network-online.target
After=network-online.target
[Service]
User=cadvisor
Group=cadvisor
Type=simple
ExecStart=/usr/local/bin/cadvisor
[Install]
WantedBy=multi-user.target
After creating these files, run the following commands:
sudo systemctl daemon-reload
sudo systemctl enable prometheus node_exporter cadvisor alertmanager
sudo systemctl start prometheus node_exporter cadvisor alertmanager
Note: You'll need to create the appropriate users and groups (prometheus, node_exporter, cadvisor, alertmanager) and ensure proper permissions for directories and files.
#!/bin/bash
# Check if script is run as root
if [ "$EUID" -ne 0 ]
then echo "Please run as root"
exit
fi
# Create systemd service file for Prometheus
cat > /etc/systemd/system/prometheus.service <<EOL
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for Node Exporter
cat > /etc/systemd/system/node_exporter.service <<EOL
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for cAdvisor
cat > /etc/systemd/system/cadvisor.service <<EOL
[Unit]
Description=cAdvisor
Wants=network-online.target
After=network-online.target
[Service]
User=cadvisor
Group=cadvisor
Type=simple
ExecStart=/usr/local/bin/cadvisor
[Install]
WantedBy=multi-user.target
EOL
# Create systemd service file for Alertmanager
cat > /etc/systemd/system/alertmanager.service <<EOL
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager
[Install]
WantedBy=multi-user.target
EOL
# Reload systemd to recognize new service files
systemctl daemon-reload
# Enable and start services
systemctl enable prometheus node_exporter cadvisor alertmanager
systemctl start prometheus node_exporter cadvisor alertmanager
echo "Systemd service files have been created and services have been enabled and started."
echo "Please ensure that you have created the necessary users and groups,"
echo "and that the binary files and directories exist with proper permissions."
Configure Grafana to Use Prometheus as a Data Source:
- Access Grafana via your web browser (http://localhost:3000/). Since we have an application running on port 3000, we configured Grafana to use 3050 instead.
- Navigate to Configuration > Data Sources > Add data source.
- Select Prometheus and set the URL to http://localhost:9090. Click Save & Test.
Import Node Exporter Dashboard:
- In Grafana, go to the Dashboards page.
- Click the “+” icon and select “Import”.
- Enter Dashboard ID 1860 and click “Load”.
- Choose the Prometheus data source and click “Import”.
Using cAdvisor:
- In Grafana, click the “+” icon and select “Dashboard”.
- Click “import” and the add the dashboard json or ID.
Configuring Dynamic Variables:
- Navigate to the desired dashboard, click on the settings icon, select the variables tab, and create a new dynamic variable.
- In this case, we are creating a dynamic variable to reflect the environments of our containerized applications, such as dev, staging, and prod.
Testing the Dynamic Variable:
- From the image below, our dynamic variable lists the containers based on the environments.
Create and Configure Alerts in Grafana:
- Open the home menu bar and.
- Navigate to the Alert tab and click "Create Alert."
- Define conditions based on Prometheus queries (e.g.,
rate(myapp_request_count[1m]) > 100
).
Configure Alert Evaluation and Frequency:
- Set the evaluation interval (e.g., every minute) and the duration for which the alert condition must be met (e.g., 5 minutes).
Set Up Notification Channels:
- Navigate to Alerting > Contact Points > Click
Add contact point
button. - Configure the notification channel (e.g., Slack) with your webhook URL and other details.
Link Alerts to Notification Channels:
- In the alert configuration, associate the alert with the notification channel created.
Managed Alerts and Notifications:
- We configured Grafana managed alerts to notify the team via our Slack configuration webhook URL, which sends the alerts to the #devops-alerts channel.
- Alerts were created for Low Disk, High CPU, Container Down, and High Memory.
Set Data Retention in Prometheus:
- By default, Prometheus has a data retention period of 15 days. We specified our data retention period to 30 days and set the maximum storage size to 5GB. This was achieved by adding the following flags: --storage.tsdb.retention.time=30d --storage.tsdb.retention.size=5GB. The updated service file looks like this:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--storage.tsdb.retention.time=30d \
--storage.tsdb.retention.size=5GB
Restart=always
[Install]
WantedBy=multi-user.target
User Roles and Permissions in Grafana:
- Log in to Grafana and navigate to the home menu bar. Expand the Administration section to create a team or add users.
- We created a team called FE DevOps and added our team members to the team.
- Team leads were given admin access, while other members were assigned viewer access.
Made with ❤️ by Ravencodes | AugustHottie | CodeReaper0 | bySegunMoses | Suesue | DrInTech22 courtesy of @HNG-Internship