Skip to content

7. Monitoring with Prometheus and Grafana

Sudo Bro edited this page Aug 1, 2024 · 15 revisions

Installation

Grafana and Prometheus are critical, complementary tools for modern infrastructure and software monitoring. Prometheus, an open-source system, collects and stores metrics from various sources in a time-series database, with a default 15-day retention period. While Prometheus offers basic querying capabilities, its primary function is data collection and storage. Grafana, on the other hand, provides a user-friendly web interface for visualizing this data. It connects to Prometheus, uses PromQL (Prometheus Query Language) to query the stored metrics, and presents the information in customizable dashboards. Together, they form a powerful solution for comprehensive monitoring and data visualization in software environments.

This documentation covers installation, configuration, dashboard creation, alerting, and best practices for maintaining this monitoring setup.

Objectives

  • Install and Configure Prometheus and Grafana
  • Create and Configure Grafana Dashboards.
  • Setting up alerts.
  • Ensuring proper data retention and access control.

Installation of Prometheus

  • Step 1: Download Prometheus
PROMETHEUS_VERSION="2.53.1"
wget [https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz](https://github.com/prometheus/prometheus/releases/download/v$%7BPROMETHEUS_VERSION%7D/prometheus-$%7BPROMETHEUS_VERSION%7D.linux-amd64.tar.gz)
  • Check for the latest version on the Prometheus website.

  • Sets the Prometheus version to install.

  • Downloads the Prometheus binary for Linux AMD64 architecture.

  • Step 2: Extract the archive

tar xvfz prometheus*.tar.gz
  • Step 3: Setup Prometheus directories and user
sudo mkdir -p /opt/prometheus /etc/prometheus /var/lib/prometheus
sudo useradd --no-create-home --shell /bin/false prometheus || true
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus || true
  • Step 4: Move Prometheus files
sudo mv prometheus-${PROMETHEUS_VERSION}.linux-amd64/* /opt/prometheus/
  • Step 5: Copy Prometheus binaries
sudo cp /opt/prometheus/prometheus /usr/local/bin/
sudo cp /opt/prometheus/promtool /usr/local/bin/
sudo cp /opt/prometheus/consoles /etc/prometheus
sudo cp /opt/prometheus/console_libraries /etc/prometheus
sudo cp /opt/prometheus/prometheus.yml /etc/prometheus
  • Step 6: Set ownership for Prometheus binaries
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo chown prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
sudo chown -R prometheus:prometheus /var/lib/prometheus
  • Step 7: Create Prometheus configuration
cat << EOF | sudo tee /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["91.229.239.213:9090"]
  - job_name: "node exporter"
    static_configs:
      - targets: ["91.229.239.213:9100"]
  - job_name: "postgres-exporter"
    static_configs:
      - targets: ["91.229.239.213:9187"]
EOF
  • Step 8: Create a Prometheus systemd service file
cat << EOF | sudo tee /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
EOF
  • Step 9: Enable and start Prometheus
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
  • Step 10: Allow port 9090 on your firewall for Prometheus
sudo ufw allow 9090/tcp

This allows incoming TCP traffic on port 9090, which is Prometheus' default port

Installation of Grafana

It is important we install the necessary dependencies on our ubuntu system to prepare it for Grafana installation. Without these packages, you might encounter errors or issues during the later stages of the installation process.

  • Step 1: Install dependencies
sudo apt-get install -y apt-transport-https software-properties-common wget
  • wget is a utility for non-interactive download of files from the web. It's used in the next steps to download the Grafana GPG key, which is essential for verifying the authenticity of the Grafana packages.

  • apt-transport-https allows the apt package manager to retrieve packages over HTTPS. It's crucial for security, ensuring that package downloads are encrypted and protected from tampering.

  • software-properties-common provides scripts for managing software repositories. It's particularly important for adding PPAs (Personal Package Archives) and other third-party repositories, which might be necessary for some Grafana configurations or plugins.

  • Step 2: Add Grafana GPG key

sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null

Adding the Grafana GPG key is a critical security measure. It ensures that you're installing genuine, unaltered Grafana packages, protects against potential security threats, and sets up your system for secure ongoing management of Grafana. This step is not just a formality but a fundamental part of maintaining the security and integrity of your system when installing and using third-party software like Grafana

  • Step 3: Add Grafana repository
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com/ stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
  • Step 4: Update package lists again
sudo apt-get update
  • Step 5: Install Grafana
sudo apt-get install -y grafana
echo "You can access Grafana at http://your_server_ip:3000/"
echo "Default login: admin / admin"
echo "Grafana is set to start automatically on system boot."

This step downloads and install all necessary Grafana components, including binaries, configuration files, and service scripts. The echo statements provide information on how to access Grafana providing the url format and the default login credentials.

  • Step 6: Start the Grafana server To start and enable the Grafana server, run the commands below.
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Installation of Node Exporter

  • Step 1: Download Node Exporter
NODE_EXPORTER_VERSION="1.8.2"
wget [https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz](https://github.com/prometheus/node_exporter/releases/download/v$%7BNODE_EXPORTER_VERSION%7D/node_exporter-$%7BNODE_EXPORTER_VERSION%7D.linux-amd64.tar.gz)
  • Step 2: Extract Node Exporter
tar xvfz node_exporter-*.tar.gz
  • Step 3: Move Node Exporter binary
sudo mv node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64/node_exporter /usr/local/bin/
  • Step 4: Create Node Exporter user
sudo useradd --no-create-home --shell /bin/false node_exporter || true
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter || true
  • Step 5: Create systemd service for Node Exporter
cat << EOF | sudo tee /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF
  • Step 6: Start and enable Node Exporter service
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
log "Node Exporter installation completed"

Installation of Postgres Exporter

  • Step 1: Create a System User for Postgres Exporter
log "creating system user for Postgres Exporter"
sudo groupadd --system postgres_exporter || true
sudo useradd -s /sbin/nologin --system -g postgres_exporter postgres_exporter || true
  • Step 2: Download Postgres Exporter
POSTGRES_EXPORTER_VERSION="0.15.0"
wget [https://github.com/prometheus-community/postgres_exporter/releases/download/v${POSTGRES_EXPORTER_VERSION}/postgres_exporter-${POSTGRES_EXPORTER_VERSION}.linux-amd64.tar.gz](https://github.com/prometheus-community/postgres_exporter/releases/download/v$%7BPOSTGRES_EXPORTER_VERSION%7D/postgres_exporter-$%7BPOSTGRES_EXPORTER_VERSION%7D.linux-amd64.tar.gz)
  • Step 3: Extract Postgres Exporter
tar xvfz postgres_exporter-*.linux-amd64.tar.gz
  • Step 4: Setup Postgres Exorter directory and user
sudo mkdir -p /opt/postgres_exporter
sudo useradd --no-create-home --shell /bin/false postgres_exporter || true
sudo chown postgres_exporter:postgres_exporter /opt/postgres_exporter/
sudo chmod 755 /opt/postgres_exporter/postgres_exporter
  • Step 5: Move Postgres Exporter Binary
sudo mv postgres_exporter-${POSTGRES_EXPORTER_VERSION}.linux-amd64/* /opt/postgres_exporter/
  • Step 6: Set up the environment file
cat << EOF | sudo tee /opt/postgres_exporter/.env
DATA_SOURCE_NAME="postgresql://admin:[email protected]:5432/postgres?sslmode=disable"
EOF
  • Step 7: Set ownership for Prometheus binaries
sudo chown postgres_exporter:postgres_exporter /opt/postgres_exporter/.env
sudo chmod 600 /opt/postgres_exporter/.env
  • Step 8: Create Postgres configuration
cat << EOF | sudo tee /etc/systemd/system/postgres_exporter.service
[Unit]
Description=Prometheus exporter for Postgresql
Wants=network-online.target
After=network-online.target

[Service]
User=postgres_exporter
Group=postgres_exporter
Type=simple
WorkingDirectory=/opt/postgres_exporter
EnvironmentFile=/opt/postgres_exporter/.env
ExecStart=/opt/postgres_exporter --web.listen-address=:9187 --web.telemetry-path=/metrics
Restart=always

[Install]
WantedBy=multi-user.target
EOF
check_success "Postgres Exporter configuration creation"

log "Postgres Exporter installation completed"
  • Step 9: Enable and start Postgres Exporter
sudo systemctl daemon-reload
sudo systemctl start postgres_exporter
sudo systemctl enable postgres_exporter
  • Step 10: Database Check
echo "Checking metrics..."
curl -s http://127.0.0.1:9187/metrics | grep pg_up

This checks for the health of the Postgresql database by querying the Prometheus exporter running on the local machine, specifically looking for the pg_up metric which indicates the database's operational status.

  • Step 11: Set-up firewall
sudo ufw allow 9187/tcp
sudo ufw reload

Dashboard Setup

Java Application Dashboard

To set up a Java monitoring dashboard, follow the below steps:

  • You can choose to configure a dashboard from scratch or use a preconfigured one. For a preconfigured dashboard, copy this dashboard ID 4701
  • In Grafana's dashboard tab, click on New > New Dashboard
  • Click on the import dashboard card.
  • Paste the dashboard ID in the Find and import dashboards box and click on the Load button.
  • Give the dashboard a name, select or create a folder for it and select Prometheus as the data source then load the dashboard.
Screenshot 2024-08-01 at 19 35 55

Node Exporter Dashboard

To set up a Node exporter monitoring dashboard to monitor your server, follow the below steps:

  • You can configure a dashboard from scratch or use a preconfigured one. For a preconfigured dashboard, copy this dashboard ID 1860
  • In Grafana's dashboard tab, click on New > New Dashboard
  • Click on the import dashboard card.
  • Paste the dashboard ID in the Find and import dashboards box and click on the Load button.
  • Give the dashboard a name, select or create a folder for it, select Prometheus as the data source, then load the dashboard.
Screenshot 2024-08-01 at 19 37 45

Postgres Exporter Dashboard

To set up a Postgres exporter monitoring dashboard, follow the below steps:

  • You can configure a dashboard from scratch or use a preconfigured one. For a preconfigured dashboard, copy this dashboard ID 12485
  • In Grafana's dashboard tab, click on New > New Dashboard
  • Click on the import dashboard card.
  • Paste the dashboard ID in the Find and import dashboards box and click on the Load button.
  • Give the dashboard a name, select or create a folder for it, select Prometheus as the data source, then load the dashboard.
Screenshot 2024-08-01 at 19 38 48

Alerting Setup with Grafana

This guide walks you through the process of setting up alerts in Grafana for instance disk space alert.

  • Step 1: Login to Setup dashboard

Log in to your Grafana dashboad, click on new dashboard to create a new one and navigate to the Alerting section.

image

  • Step 2: Click Alert to set alert for the metrics you are monitoring. image

  • Step 3: Create a New Alert Rule

Click on "Add new alert rule" to begin setting up your alert.

image

  • Step 4: Switch to Code View

In the upper right corner, switch to the "Code" view for more detailed configuration options.

image

  • Step 5: Enter the Query

Use the following PromQL query to monitor disk space usage:

max(100 - ((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes)) by (instance)

image

  • Step 6: Review Query Results

This query returns the percentage of disk space used by each instance. You should see results for different instances in your environment.

image

  • Step 7: Set Alert Threshold

Specify the disk space usage percentage at which you want to receive an alert.

image

  • Step 8: Save and Configure Notifications

Save the rule and set up the contact point where you would like to receive notifications for alerts (e.g., Slack).

Note

Adjust the threshold percentage based on your specific needs and infrastructure requirements.

  • Step 9: Set up a contact point for Slack

Fill in a Name, Integration (in this case, Slack) and WebHook URL from Slack

image

After that, test and save

  • Step 10: Set the alert rule to use your slack contact point

Then choose the contact point from the specific alert rule configuration page. Save and exit.

image

How to Configure Contact Point

  • Click Manage contact points to configure who receives notifications and how they are sent image

  • Click Add contact point to setup a new one image

  • Setup the contact point using the right information

    • Enter the name of the alert
    • Select the channel to receive the alert
    • For Slack, paste the webhook url: https://hooks.slack.com/services/AAAAAAAA/BBBBBBBBB/CCCCCCCCCCCCCCCCCCCCCCCC

Screenshot 2024-08-01 203235

  • Test it if successful, image

  • Save the contact point image

Data Retention in Prometheus

To access historical data for capacity planning, compliance, auditing, and enhanced debugging, it is best to set up data retention for Prometheus.

  • Step 1: In the earlier Prometheus installation, Prometheus was configured as a service. Open the prometheus.service config:
nano /etc/systemd/system/prometheus.service
  • Step 2: Add the below flag in the prometheus.service file if you do not have it yet, if you have it, skip this step.
--storage.tsdb.path /var/lib/prometheus/

This flag specifies the directory where Prometheus will store its time-series database (TSDB).

Breakdown:

--storage.tsdb.path: This indicates that we are setting the path for the TSDB storage.

/var/lib/prometheus/: This is the actual directory path where Prometheus will create the necessary files and folders to store your time-series data.

  • Step 3: For data retention, add the below flag at the end of the ExecStart line:
--storage.tsdb.retention.time=60d

This overrides the default 15d storage for Prometheus and tells it to retain data for 60 days.

Note: Set the retention time based on how long you need the data stored. The duration should match your project requirements and compliance needs.

The file should look like this:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries
    --storage.tsdb.retention.time=60d
[Install]
WantedBy=default.target
  • Step 4: Reload the daemon to apply the changes
sudo systemctl daemon reload
  • Step 5: Restart Prometheus for the changes to be applied.
sudo systemctl restart prometheus

In your Prometheus UI, under the Status tab, select 'Command-line Flags' and search for --storage.tsdb.retention.time to verify the retention time is set to 60d.

Screenshot 2024-07-30 at 23 04 52

Access Control for Grafana Dashboards

To ensure only specific users can access your Grafana dashboards, it is important to configure user roles and permissions.

  • Step 1: Log in to Grafana as an admin user. Navigate to "Administration" tab (gear icon) > "Users and access".
Screenshot 2024-07-30 at 21 03 33
  • Step 2: In the "Users and access" page, click on "Users" to add a user.
Screenshot 2024-07-30 at 21 03 49

Then click on the "New User" button. Screenshot 2024-07-30 at 21 04 26

  • Step 3: Set the user's name, email, username and password. Then hit the "Create user" button.
Screenshot 2024-07-30 at 21 04 43

Verify the user details and assign the user to an organization with a role (Viewer, Editor, or Admin)

Screenshot 2024-07-30 at 21 05 20

After the user has been created, permissions can be set on what dashboards the user can access and what the user can do.

  • Step 4: Navigate to the Dashboards tab. Select the dashboard to which you want to assign the user.
Screenshot 2024-07-30 at 21 06 25
  • Step 5: In the top right part of the dashboard you will find a gear icon. Click on it to access the dashboard settings.
Screenshot 2024-07-30 at 21 07 39
  • Step 6: In the dashboard settings page, click on the "Add a permission" button and select the user, username, and role for that user.
Screenshot 2024-07-30 at 21 09 13

Troubleshooting Tips and Best Practices

Troubleshooting

  1. To check the status of all services including services being monitored
sudo systemctl status prometheus.service
sudo systemctl status grafana-server
sudo systemctl status java_app.service
sudo systemctl status postgresql
sudo systemctl status postgres_exporter.service
sudo systemctl status node_exporter.service
  1. To check logs of services in case of failures
sudo journalctl -u java_app.service -n 50 --no-pager
sudo journalctl -u postgresql -n 50 --no-pager
sudo journalctl -u prometheus.service -n 50 --no-pager
sudo journalctl -u postgres_exporter.service -n 50 --no-pager
sudo journalctl -u node_exporter.service -n 50 --no-pager
  1. To verify data retention was successfully set for Prometheus
ps -eo args | grep -- '--storage.tsdb.retention.time'
  1. To verify Prometheus is scraping metrics from your targets
  • Visit the Prometheus UI you setup.
  • Click on the 'Status' drop and select 'Targets'.
  • In the Targets page you can see the services actively being monitored.

Best Practices

  1. Set up monitoring and alerting for your critical services.
  2. Enforce the principle of least privilege by ensuring only certain users can access the monitoring dashboards.
  3. Use the latest software version to avoid errors.