Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize backend docker image #1103

Merged
merged 10 commits into from
Oct 19, 2024
Merged

Optimize backend docker image #1103

merged 10 commits into from
Oct 19, 2024

Conversation

beastoin
Copy link
Collaborator

@beastoin beastoin commented Oct 18, 2024

Issue: #1064

Key changes:

  • Use registry cache of Docker buildx in Github actions to reduce build time
    -> Reduce build time from 1x minutes to ~3 minutes: 400% faster build
  • Fix deprecated error of google-github-actions/auth by upgrading it from v0 to v2
  • Fix deprecated error of google-github-actions/deploy-cloudrun by upgrading it from v0 to v2

before
image

after
image

Summary by CodeRabbit

  • New Features

    • Enhanced deployment workflow for the backend to Google Cloud Run, allowing manual execution with input parameters.
    • Optimized Docker image build process with a multi-stage approach, resulting in a smaller final image.
  • Bug Fixes

    • Addressed "no space left on device" issue during deployment by cleaning up temporary directories.
  • Refactor

    • Updated action versions for Google authentication and Cloud Run deployment to improve reliability.
    • Streamlined Dockerfile for better dependency management and reduced image size.

Copy link

coderabbitai bot commented Oct 18, 2024

Walkthrough

The pull request introduces updates to the workflow configuration for deploying the backend to Google Cloud Run, enabling manual execution with input parameters. It includes updates to several action versions and adds a step to delete the /opt/hostedtoolcache directory. The Dockerfile has been modified to implement a multi-stage build process, changing the base images and introducing a virtual environment for dependency management. Key changes include optimizations in the image build process and the installation of essential packages.

Changes

File Change Summary
.github/workflows/gcp_backend.yml - Trigger updated to workflow_dispatch.
- Action versions updated for Google authentication and deployment.
- New step added to delete /opt/hostedtoolcache.
- Docker image build process refactored to use docker/build-push-action@v6.
backend/Dockerfile - Base image changed to use multi-stage build (python:3.11 AS builder and python:3.11-slim).
- Virtual environment created and utilized.
- Consolidated package installations into a single command.
- Command to start the application remains unchanged.

Poem

In the cloud where bunnies play,
Our backend hops a brand new way.
With Docker's dance and GitHub's cheer,
A streamlined build is finally here!
So let us leap and celebrate,
For every change, we elevate! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between bfdc8c4 and 764c73a.

📒 Files selected for processing (2)
  • .github/workflows/gcp_backend.yml (1 hunks)
  • backend/Dockerfile (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/gcp_backend.yml
  • backend/Dockerfile

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@beastoin
Copy link
Collaborator Author

@thainguyensunya fyi ~

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
.github/workflows/gcp_backend.yml (3)

Line range hint 1-41: Approve workflow trigger changes with a minor suggestion.

The transition to a manually triggered workflow with input parameters for environment and branch selection is a good improvement. It provides more control over deployments and allows for greater flexibility.

The input validation step is a nice addition to ensure only valid environments are used. Consider adding a similar validation for the branch input to ensure it exists in the repository.

Here's a suggested addition to validate the branch:

- name: Validate Branch Input
  run: |
    git fetch origin ${{ github.event.inputs.branch }}
    if ! git rev-parse --verify origin/${{ github.event.inputs.branch }} &>/dev/null; then
      echo "Invalid branch: ${{ github.event.inputs.branch }}. Branch does not exist in the repository."
      exit 1
    fi
🧰 Tools
🪛 actionlint

44-44: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 yamllint

[error] 51-51: trailing spaces

(trailing-spaces)


[warning] 79-79: too many blank lines

(1 > 0) (empty-lines)


55-67: Approve Docker build and push improvements with a suggestion.

The refactoring of the Docker build and push process using docker/build-push-action@v6 is an excellent improvement. The explicit definition of build context and Dockerfile path enhances clarity, while the addition of caching options can significantly improve build performance.

To further improve the workflow, consider adding a step to prune old images from the registry to manage storage and costs.

Here's a suggested addition after the build and push step:

- name: Prune old images
  run: |
    gcloud container images list-tags gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }} \
      --format='get(digest)' --filter='NOT tags:latest' | \
    xargs -I {} gcloud container images delete gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}@{} --quiet

This step will remove all images that are not tagged as 'latest', helping to manage storage in your container registry.


51-51: Minor formatting improvements.

There are a couple of minor formatting issues in the file:

  1. Remove the trailing space on line 51.
  2. Remove the extra blank line at the end of the file (line 79).

These changes will improve the overall cleanliness of the code without affecting its functionality.

Also applies to: 79-79

🧰 Tools
🪛 yamllint

[error] 51-51: trailing spaces

(trailing-spaces)

backend/Dockerfile (1)

14-14: Optimize image size by cleaning up apt caches

After installing packages with apt-get, it's good practice to clean up the local repository of retrieved package files to reduce the image size. This can be done by removing the contents of /var/lib/apt/lists.

Apply this diff to clean up apt caches:

- RUN apt-get update && apt-get -y install ffmpeg curl unzip
+ RUN apt-get update && apt-get -y install ffmpeg curl unzip && rm -rf /var/lib/apt/lists/*
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b6edad1 and b4f56b9.

📒 Files selected for processing (2)
  • .github/workflows/gcp_backend.yml (1 hunks)
  • backend/Dockerfile (1 hunks)
🧰 Additional context used
🪛 actionlint
.github/workflows/gcp_backend.yml

44-44: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 yamllint
.github/workflows/gcp_backend.yml

[error] 51-51: trailing spaces

(trailing-spaces)


[warning] 79-79: too many blank lines

(1 > 0) (empty-lines)

🔇 Additional comments (8)
.github/workflows/gcp_backend.yml (2)

46-51: Approve Google Auth action update.

Updating the Google Auth action from v0 to v2 is a good practice. This ensures you're using the latest features and security improvements provided by the action.

🧰 Tools
🪛 yamllint

[error] 51-51: trailing spaces

(trailing-spaces)


52-54: Approve addition of GCR login step.

The addition of the GCR login step is necessary and correctly implemented. This ensures that the workflow can push and pull Docker images from Google Container Registry.

backend/Dockerfile (6)

3-4: Correct setup of virtual environment and PATH

The creation of the virtual environment and updating the PATH environment variable are correctly implemented, ensuring that the application uses the isolated environment for dependencies.


7-7: Efficient installation of dependencies with pip

Using --no-cache-dir and --upgrade flags optimizes the dependency installation process by preventing cache storage and ensuring the latest packages are installed.


9-9: Switching to slim image for reduced final image size

Using python:3.11-slim in the final stage is a good choice to minimize the size of the Docker image, which can lead to faster deployments and reduced resource usage.


11-12: Setting working directory and PATH in the final stage

Setting the WORKDIR to /app and updating the PATH ensures that the application runs in the correct directory with access to the virtual environment's binaries.


16-17: Copying virtual environment and application code

Copying the virtual environment and the application code into the final image is correctly handled, ensuring that all dependencies and application files are included for runtime.


20-20: Application entrypoint command remains appropriate

The CMD instruction properly starts the application using uvicorn, specifying the host and port, which aligns with the previous configuration.

Comment on lines 67 to 79
- name: Deploy to Cloud Run
id: deploy
uses: google-github-actions/deploy-cloudrun@v0
uses: google-github-actions/deploy-cloudrun@v2
with:
service: ${{ env.SERVICE }}
region: ${{ env.REGION }}
image: gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}

# If required, use the Cloud Run url output in later steps
- name: Show Output
run: echo ${{ steps.deploy.outputs.url }} No newline at end of file
run: echo ${{ steps.deploy.outputs.url }}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Approve Cloud Run deployment action update and suggest checkout action update.

Updating the Cloud Run deployment action from v0 to v2 is a good practice. This ensures you're using the latest features and security improvements provided by the action.

Additionally, the static analysis tool has flagged that the actions/checkout@v2 action is outdated. To address this:

Update the checkout action to the latest version:

- uses: actions/checkout@v2
+ uses: actions/checkout@v4

This will ensure you're using the most up-to-date version of the checkout action, which may include important bug fixes and performance improvements.

Committable suggestion was skipped due to low confidence.

🧰 Tools
🪛 yamllint

[warning] 79-79: too many blank lines

(1 > 0) (empty-lines)

@@ -1,16 +1,21 @@
FROM tiangolo/uvicorn-gunicorn:python3.11
FROM python:3.11 AS builder
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using 'python:3.11-slim' in the builder stage for consistency and reduced image size

Since you're using python:3.11-slim in the final stage, you might consider using it in the builder stage as well to maintain consistency and potentially reduce the overall image size. Ensure that all necessary build tools and libraries are available in the slim image for successful dependency installation.

Apply this diff to implement the suggestion:

-FROM python:3.11 AS builder
+FROM python:3.11-slim AS builder
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM python:3.11 AS builder
FROM python:3.11-slim AS builder

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
backend/Dockerfile (4)

6-9: Good practices for dependency management, with a minor suggestion

The approach for handling dependencies is well-structured:

  1. Installing git for potential VCS-based dependencies.
  2. Copying only the requirements file for better layer caching.
  3. Using the --upgrade flag to ensure up-to-date packages.

Consider combining the git installation with other necessary build tools in a single RUN command to reduce image layers:

-RUN apt-get update && apt-get -y install git
+RUN apt-get update && apt-get -y install git build-essential && rm -rf /var/lib/apt/lists/*

This change would also clean up the apt cache, further reducing the image size.


16-16: Efficient package installation with a suggestion for documentation

The installation of additional packages is done efficiently in a single RUN command, which is a good practice for minimizing image layers. Cleaning up the apt cache also helps reduce the final image size.

Consider adding comments to explain why each of these packages (ffmpeg, curl, unzip) is necessary for the application. This will help future maintainers understand the dependencies and potentially optimize the image further if any of these become unnecessary.

Example:

# Install required packages:
# - ffmpeg: for audio/video processing
# - curl: for downloading resources
# - unzip: for extracting downloaded archives
RUN apt-get update && apt-get -y install ffmpeg curl unzip && rm -rf /var/lib/apt/lists/*

18-22: Efficient final setup with a minor optimization suggestion

The final steps of the Dockerfile are well-structured:

  1. Copying the virtual environment from the builder stage is an efficient way to transfer dependencies.
  2. Copying the backend code leverages Docker's layer caching.
  3. Exposing port 8080 and using uvicorn to run the application remain unchanged and appropriate.

Consider optimizing the COPY command for the backend code:

-COPY backend/ .
+COPY backend/ ./

This explicit use of ./ as the destination ensures that the contents of the backend directory are copied directly into the current working directory (/app), which can prevent potential issues with nested directories.


1-22: Excellent optimization of the backend Docker image

This updated Dockerfile significantly improves the build process and final image:

  1. Implements a multi-stage build, separating the build environment from the runtime environment.
  2. Uses slim images to reduce the final image size.
  3. Leverages virtual environments for better dependency isolation.
  4. Optimizes layer caching through strategic ordering of commands.
  5. Combines RUN commands to reduce the number of layers.

These changes will likely result in a smaller, more efficient Docker image and faster build times, aligning perfectly with the PR objective of optimizing the backend Docker image.

To further improve the CI/CD pipeline:

  1. Consider implementing Docker layer caching in your CI/CD process to speed up builds even more.
  2. Regularly update the base images and dependencies to ensure you're using the latest security patches.
  3. Implement a vulnerability scanning step in your pipeline to catch any security issues in the dependencies or base images.
.github/workflows/gcp_backend.yml (1)

Line range hint 31-37: Great addition of environment validation!

The new step to validate the environment input is an excellent safety measure. It prevents accidental deployments to incorrect environments, which aligns well with best practices for CI/CD pipelines.

A minor suggestion to improve readability:

Consider using a case statement for better readability and easier expansion in the future:

- if [[ "${{ github.event.inputs.environment }}" != "development" && "${{ github.event.inputs.environment }}" != "prod" ]]; then
-   echo "Invalid environment: ${{ github.event.inputs.environment }}. Must be 'development' or 'prod'."
-   exit 1
- fi
+ case "${{ github.event.inputs.environment }}" in
+   development|prod)
+     ;;
+   *)
+     echo "Invalid environment: ${{ github.event.inputs.environment }}. Must be 'development' or 'prod'."
+     exit 1
+     ;;
+ esac

Also applies to: 38-38

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b4f56b9 and bfdc8c4.

📒 Files selected for processing (2)
  • .github/workflows/gcp_backend.yml (1 hunks)
  • backend/Dockerfile (1 hunks)
🧰 Additional context used
🔇 Additional comments (7)
backend/Dockerfile (2)

1-4: Excellent implementation of multi-stage build with slim images!

Great job on implementing the multi-stage build process and using slim images. This change addresses the previous suggestion and will help reduce the final image size. The creation of a virtual environment is a best practice for dependency isolation.


11-15: Well-structured final stage setup

The final stage is set up efficiently:

  1. Using a slim Python image reduces the final image size.
  2. Setting the working directory to /app is a good practice.
  3. Updating the PATH ensures the virtual environment is correctly used.

These changes contribute to a more optimized and maintainable Docker image.

.github/workflows/gcp_backend.yml (5)

Line range hint 3-17: Excellent update to workflow trigger and inputs!

The change to workflow_dispatch with custom inputs for environment and branch selection is a great improvement. This allows for:

  1. More controlled deployments through manual triggering.
  2. Flexibility in choosing the deployment environment and branch.
  3. Better alignment with different stages of your development lifecycle.

These changes contribute to the overall optimization of the backend deployment process, as mentioned in the PR objectives.


39-41: Excellent updates to action versions and new steps!

  1. Updating the checkout action to v4 and Google Auth action to v2 is great for leveraging the latest features and security improvements.
  2. The new step to delete the /opt/hostedtoolcache directory is a smart workaround for potential storage issues on GitHub-hosted runners.
  3. Adding a separate step for logging into Google Container Registry (GCR) improves clarity and maintainability.

These changes align well with the PR objective of optimizing the backend deployment process and addressing deprecated errors.

The previous review comment about updating the checkout action has been addressed in this PR.

Also applies to: 42-42, 44-44, 48-48, 52-54


55-67: Excellent optimization of Docker build process!

The new Docker build and push process is a significant improvement:

  1. Using docker/setup-buildx-action@v3 enables advanced features like multi-platform builds and improved caching.
  2. The docker/build-push-action@v6 with registry caching options is a key factor in achieving the 400% faster build time mentioned in the PR objectives.
  3. The cache-from and cache-to configurations ensure that subsequent builds can reuse layers, further optimizing the process.

These changes directly address the main goal of this PR to optimize the backend Docker image and significantly reduce build times.


70-70: Approved: Cloud Run deployment action update

Updating the Google Cloud Run deployment action from v0 to v2 is an excellent change. This ensures you're using the latest features and security improvements provided by the action. It also addresses the PR objective of resolving deprecated errors.


Line range hint 1-79: Summary: Excellent optimization and modernization of the backend deployment workflow

This PR successfully achieves its objectives:

  1. Significantly reduces build time by implementing Docker Buildx and registry caching.
  2. Addresses deprecated errors by updating action versions.
  3. Improves workflow flexibility with manual triggers and environment selection.
  4. Enhances security and reliability with environment validation and up-to-date actions.

These changes collectively result in a more efficient, secure, and maintainable deployment process for the backend service. Great work on this optimization!

@beastoin beastoin merged commit 17d10b2 into main Oct 19, 2024
2 checks passed
@beastoin beastoin deleted the optimize-backend-docker-image branch October 19, 2024 07:54
@beastoin
Copy link
Collaborator Author

lgtm 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants