Skip to content

DORA Metrics and Pipeline Workflow Configuration Changes

ekedonald edited this page Aug 9, 2024 · 12 revisions

Dora Metrics and Workflow Configuration Changes

Overview

Recent updates were made to our CI/CD pipeline to enhance support for Dora Metrics and optimize deployment workflows. Initially, the focus was on automating the build, test, and deployment process for the Go backend application within the golang boilerplate repository. The new setup now involves compressing Docker artifacts for each branch—dev, staging, and main—within the golang boilerplate repository and then decompressing and building the updated containers on the Telex app's remote server.

The old pipeline was triggered by pushes or pull requests to the dev branch and consists of two main jobs:

  1. Build and Test Job:

    • Runs on ubuntu-latest.
    • Sets up PostgreSQL and Redis services for testing.
    • Builds the application and runs tests using the Go environment.
    • Ensures the application is properly started before running tests.
  2. Deploy Job:

    • Executes only on pull requests, after the build and test job succeeds.
    • Uses SSH to deploy the application to a remote server.
    • Clones or pulls the latest code from the dev branch.
    • Runs a deployment script with environment-specific parameters.

Previous Development Workflow

In the previous workflow, the Docker image for the Go application was built using a Dockerfile. The pipeline was designed to handle deployments of several environment and involved the following steps:

  1. Build and Test: The Go application was built and tested locally, using PostgreSQL and Redis as services within the CI pipeline.
  2. Deployment: The application was deployed to a remote server by pulling the latest code from the dev branch and running a deployment script.

This setup was streamlined for multi environment, with the Dockerfile providing a uniform build process.

New Development Workflow

image The new workflow leverages Docker Compose to manage environment-specific configurations. The new workflow is more robust and supports multiple environments by employing different Docker Compose files tailored for each environment. Here's an overview of the key changes:

  1. Multi-Environment Support:

    • Separate Docker Compose files are used for Development, Staging, and Production, each configured with environment-specific settings.
    • Docker images are now built from the golang boilerplate server and pushed to the main telex app remote server, ensuring consistency across environments.
  2. CI/CD Pipeline:

    • Build Docker Image: The pipeline builds the Docker image using the Docker Compose file for the specified environment. The image is then compressed and uploaded as an artifact.
    • Upload and Deploy: The image is transferred to the main telex app remote server, where it is decompressed and loaded. Docker Compose is then used to spin up the application in the appropriate environment.

Example of Workflow Changes

Old Development CI/CD Workflow:

The previous workflow included a single job to build, test, and deploy the Go application for the various environments. This setup utilized environment variables and services like PostgreSQL and Redis within the GitHub Actions workflow.

New Development CI/CD Workflow:

The new workflow is divided into multiple jobs:

  1. Build Docker Image: Builds the Docker image for the Go application and saves it as a compressed file.
  2. Upload Docker Image: Transfers the Docker image to the remote server.
  3. Deploy Application: Runs the Docker Compose file on the remote server to deploy the application in various environments.

Docker Compose Example for Development

The following Docker Compose file is used to configure the Development environment:

name: golang_dev

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_USER: development_user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: development_db
      POSTGRES_PORT: 5432
    volumes:
      - db_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 10s
      retries: 2

  redis:
    image: redis:latest

  backend:
    image: ${COMPOSE_PROJECT_NAME}
    build:
      context: .
    depends_on:
      - db
      - redis
    ports:
      - "8000:7000"
    env_file:
      - app.env

volumes:
  db_data:

This setup provides a standardized and scalable approach to managing deployments across different environments, ensuring that each environment is configured according to its specific requirements.

PR Deploy GitHub Action

The PR Deploy GitHub Action is a tool designed to automate the deployment of pull requests into Docker containers. This allows us to test changes in isolated environments before merging them into the main codebase.

1. Overview

The PR Deploy GitHub Action automates the process of deploying pull requests to a server by building Docker images and running them in isolated Docker containers. This action is particularly useful for staging environments where one needs to verify the functionality of new code before it is merged.

2. Installation and Setup

To set up the PR Deploy GitHub Action, follow these steps:

  1. Add PR Deploy to Your Repository:

    • Create a .github/workflows/pr-deploy.yml file in your repository.
    • This file will define the GitHub Actions workflow that will be triggered by pull requests.
  2. Example Workflow File: Below is a basic example of how to configure the workflow file to use PR Deploy:

name: PR Deploy
on:
  pull_request:
    types: [opened, synchronize, reopened, closed]
  workflow_dispatch:

jobs:
  deploy-pr:
    environment: 
      name: production
    #   url: ${{ steps.deploy.outputs.preview-url }}
    runs-on: ubuntu-latest
    env:
      SSH_USERNAME: ${{ secrets.SSH_USERNAME }}
      SSH_HOST: ${{ secrets.SSH_HOST }}
      SSH_PORT: ${{ secrets.SSH_PORT }}
      SSH_PRIVATE_KEY: ${{ secrets.SSH_PRIVATE_KEY }}

    steps:
      - name: Checkout to branch
        uses: actions/checkout@v4
      - id: deploy
        name: Pull Request Deploy
        uses: hngprojects/pr-deploy@main
        with:
          server_host: ${{ env.SSH_HOST }}
          server_username: ${{ env.SSH_USERNAME }}
          server_password: ${{ secrets.SERVER_PASSWORD }}
          server_port: ${{ env.SSH_PORT }}
          comment: true
          context: '.'
          dockerfile: 'Dockerfile'
          exposed_port: '8019'
          # host_volume_path: '/var/'
          # container_volume_path: '/var/'
          github_token: ${{ secrets.GITHUB_TOKEN }}
      - name: Print Preview Url
        run: |
          echo "Preview Url: ${{ steps.deploy.outputs.preview-url }}"

3. Configuration Details:

Inputs:

  • server_host: The IP address or hostname of the server where the Docker container will be deployed.
  • server_username: The SSH username for accessing the server.
  • server_password: The SSH password for accessing the server.
  • server_port: The SSH port on the server (default is 22).
  • context: The build context for the Docker image. This should be the path where the Dockerfile is located.
  • dockerfile: The name of the Dockerfile to use.
  • exposed_port: The port to expose the application within the Docker container.
  • host_volume_path: The path on the host machine where the Docker container should mount volumes.
  • container_volume_path: The path inside the Docker container where volumes should be mounted.
  • github_token: The GitHub token, which is automatically generated and should not be declared manually.

Outputs:

  • preview-url: A URL where the deployed pull request can be previewed.

Secrets:

  • The SSH credentials (SERVER_HOST, SERVER_USERNAME, SERVER_PASSWORD, and SERVER_PORT) should be stored securely in GitHub Secrets.

4. Deploying Pull Requests:

When a pull request is opened, synchronized, or reopened, this workflow will:

  • Checkout the pull request branch.
  • Build the Docker image using the specified Dockerfile and context.
  • Save the Docker image as a tar file and upload it to the server using SCP.
  • Load the Docker image on the server and deploy it by running the container.
  • Optionally, the action can post a comment on the pull request with the status of the deployment and the preview URL.

5. Troubleshooting

  • Permission Denied (Publickey):

    • Ensure that the SSH key is correctly set up on the server and that the corresponding private key is added as a GitHub Secret.
  • Docker Image Issues:

    • Make sure that the context and dockerfile paths are correct and that all necessary files are included in the Docker build context.
  • Deployment Failures:

    • Check the logs in GitHub Actions and on the server for more detailed error messages. Ensure that the Docker container is correctly configured to handle the environment variables and volumes.

Dora Metrics Setup.

DORA metrics are a set of metrics proposed by the DevOps Research Assessment group to help DevOps teams find areas in their processes that could be improved. DORA defines four metrics:

  1. Deployment frequency
  2. Lead time
  3. Change failure rate
  4. Mean-time-to-Recovery

Why Are DORA Metrics Important?

When running CI/CD pipelines, it's vital to assess their performance. Key aspects include how quickly changes reach production, how efficiently outages are managed, and how often new changes lead to failures. DORA metrics have emerged as a standard for evaluating these factors, offering a comprehensive view of software delivery effectiveness. These metrics are divided into three key areas:

Velocity

  1. Deployment Frequency: This metric gauges how often deployments happen. Frequent deployments suggest rapid delivery of features and fixes to users, indicating smaller and more manageable changes that typically lead to fewer issues.

  2. Mean Lead Time for Changes: This measures the average time it takes for a change (commit) to go through the pipeline to production. Shorter lead times mean faster delivery of features and fixes, which is crucial for maintaining a competitive edge.

Stability

  1. Change Failure Rate: This metric shows the percentage of deployments to production that cause failures. A lower failure rate indicates effective validation processes before deployment, leading to a smoother user experience and continuous improvement.

  2. Mean Time to Recovery (MTTR): This tracks the time it takes to recover from a production failure. A shorter MTTR minimizes the impact on users and reduces financial losses, demonstrating the team's ability to manage and resolve issues swiftly.

Reliability

Reliability encompasses various operational aspects of a service beyond deployment. These metrics are linked to service level objectives (SLOs) to ensure consistent performance. Due to its extensive scope, reliability was not included in our project

Setting up DORA Metrics

You can set up DORA metrics using an already available package or write one from scratch. The idea is to get information about deployments from your source control, which in this case is Github. Using an existing implementation from mprokopov, we set it up like so:

git clone https://github.com/mprokopov/dora-exporter.git
cd dora-exporter

To start the application:

go build cmd/main.go -o dora_exporter

To run the app

nohup ./dora_exporter &

This implementation runs by default on port 8090 and exposes a webhook endpoint at /api/github. This webhook endpoint can then be added to Github as a webhook.

Screenshot 2024-08-09 at 10 57 18 AM

For our implementation, we reverse proxied the API through the Nginx to the /dora/api/github endpoint and added this endpoint to Github.

Prometheus Queries for the Metrics

  1. Deployment Frequency

This query gets the total deployments for the DORA exporter job in the past 24 hours.

sum(increase(github_deployments_total{job="golang-dora-exporter"}[24h]))
deployment frequency
  1. Lead Time for Changes

A ratio of the total time it takes for deployments to the total number of deployments.

sum(rate(github_deployments_duration_sum{job="golang-dora-exporter"}[1h])) / 
sum(rate(github_deployments_total{job="golang-dora-exporter"}[1h]))
lead time for changes
  1. Change Failure Rate

This is the ratio of the rate of failed deployments to the rate of total deployments change.

sum(rate(github_deployments_duration{job="golang-dora-exporter", status="failure"}[24h]))/sum(rate(github_deployments_total{job="golang-dora-exporter"}[24h]))
change failure rate
  1. Mean Time to Recovery

The mean time to recover from failures.

1 / (sum(rate(github_deployments_total{job="golang-dora-exporter"}[24h])))
mean time to recovery