Optimize backend docker image #1103

beastoin · 2024-10-18T11:52:34Z

Issue: #1064

Key changes:

Use registry cache of Docker buildx in Github actions to reduce build time
-> Reduce build time from 1x minutes to ~3 minutes: 400% faster build
Fix deprecated error of google-github-actions/auth by upgrading it from v0 to v2
Fix deprecated error of google-github-actions/deploy-cloudrun by upgrading it from v0 to v2

before

after

Summary by CodeRabbit

New Features
- Enhanced deployment workflow for the backend to Google Cloud Run, allowing manual execution with input parameters.
- Optimized Docker image build process with a multi-stage approach, resulting in a smaller final image.
Bug Fixes
- Addressed "no space left on device" issue during deployment by cleaning up temporary directories.
Refactor
- Updated action versions for Google authentication and Cloud Run deployment to improve reliability.
- Streamlined Dockerfile for better dependency management and reduced image size.

…ade from v0 to v2

coderabbitai · 2024-10-18T11:52:42Z

Walkthrough

The pull request introduces updates to the workflow configuration for deploying the backend to Google Cloud Run, enabling manual execution with input parameters. It includes updates to several action versions and adds a step to delete the /opt/hostedtoolcache directory. The Dockerfile has been modified to implement a multi-stage build process, changing the base images and introducing a virtual environment for dependency management. Key changes include optimizations in the image build process and the installation of essential packages.

Changes

File	Change Summary
.github/workflows/gcp_backend.yml	- Trigger updated to `workflow_dispatch`. - Action versions updated for Google authentication and deployment. - New step added to delete `/opt/hostedtoolcache`. - Docker image build process refactored to use `docker/build-push-action@v6`.
backend/Dockerfile	- Base image changed to use multi-stage build (`python:3.11 AS builder` and `python:3.11-slim`). - Virtual environment created and utilized. - Consolidated package installations into a single command. - Command to start the application remains unchanged.

Poem

In the cloud where bunnies play,
Our backend hops a brand new way.
With Docker's dance and GitHub's cheer,
A streamlined build is finally here!
So let us leap and celebrate,
For every change, we elevate! 🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between bfdc8c4 and 764c73a.

📒 Files selected for processing (2)

.github/workflows/gcp_backend.yml (1 hunks)
backend/Dockerfile (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/gcp_backend.yml
backend/Dockerfile

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

beastoin · 2024-10-18T11:54:17Z

@thainguyensunya fyi ~

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)

.github/workflows/gcp_backend.yml (3)
Line range hint 1-41: Approve workflow trigger changes with a minor suggestion.

The transition to a manually triggered workflow with input parameters for environment and branch selection is a good improvement. It provides more control over deployments and allows for greater flexibility.

The input validation step is a nice addition to ensure only valid environments are used. Consider adding a similar validation for the branch input to ensure it exists in the repository.

Here's a suggested addition to validate the branch:
- name: Validate Branch Input
  run: |
    git fetch origin ${{ github.event.inputs.branch }}
    if ! git rev-parse --verify origin/${{ github.event.inputs.branch }} &>/dev/null; then
      echo "Invalid branch: ${{ github.event.inputs.branch }}. Branch does not exist in the repository."
      exit 1
    fi
🧰 Tools

🪛 actionlint

44-44: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 yamllint

[error] 51-51: trailing spaces

(trailing-spaces)

[warning] 79-79: too many blank lines

(1 > 0) (empty-lines)

55-67: Approve Docker build and push improvements with a suggestion.

The refactoring of the Docker build and push process using docker/build-push-action@v6 is an excellent improvement. The explicit definition of build context and Dockerfile path enhances clarity, while the addition of caching options can significantly improve build performance.

To further improve the workflow, consider adding a step to prune old images from the registry to manage storage and costs.

Here's a suggested addition after the build and push step:
- name: Prune old images
  run: |
    gcloud container images list-tags gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }} \
      --format='get(digest)' --filter='NOT tags:latest' | \
    xargs -I {} gcloud container images delete gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}@{} --quiet
This step will remove all images that are not tagged as 'latest', helping to manage storage in your container registry.

51-51: Minor formatting improvements.

There are a couple of minor formatting issues in the file:

Remove the trailing space on line 51.

Remove the extra blank line at the end of the file (line 79).

These changes will improve the overall cleanliness of the code without affecting its functionality.

Also applies to: 79-79

🧰 Tools

🪛 yamllint

[error] 51-51: trailing spaces

(trailing-spaces)
backend/Dockerfile (1)
14-14: Optimize image size by cleaning up apt caches

After installing packages with apt-get, it's good practice to clean up the local repository of retrieved package files to reduce the image size. This can be done by removing the contents of /var/lib/apt/lists.

Apply this diff to clean up apt caches:
- RUN apt-get update && apt-get -y install ffmpeg curl unzip
+ RUN apt-get update && apt-get -y install ffmpeg curl unzip && rm -rf /var/lib/apt/lists/*

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b6edad1 and b4f56b9.

📒 Files selected for processing (2)

.github/workflows/gcp_backend.yml (1 hunks)
backend/Dockerfile (1 hunks)

🧰 Additional context used

🪛 actionlint

.github/workflows/gcp_backend.yml

44-44: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 yamllint

.github/workflows/gcp_backend.yml

[error] 51-51: trailing spaces

(trailing-spaces)

[warning] 79-79: too many blank lines

(1 > 0) (empty-lines)

🔇 Additional comments (8)

.github/workflows/gcp_backend.yml (2)

46-51: Approve Google Auth action update.

Updating the Google Auth action from v0 to v2 is a good practice. This ensures you're using the latest features and security improvements provided by the action.

🧰 Tools

🪛 yamllint

[error] 51-51: trailing spaces

(trailing-spaces)

52-54: Approve addition of GCR login step.

The addition of the GCR login step is necessary and correctly implemented. This ensures that the workflow can push and pull Docker images from Google Container Registry.

backend/Dockerfile (6)

3-4: Correct setup of virtual environment and PATH

The creation of the virtual environment and updating the PATH environment variable are correctly implemented, ensuring that the application uses the isolated environment for dependencies.

7-7: Efficient installation of dependencies with pip

Using --no-cache-dir and --upgrade flags optimizes the dependency installation process by preventing cache storage and ensuring the latest packages are installed.

9-9: Switching to slim image for reduced final image size

Using python:3.11-slim in the final stage is a good choice to minimize the size of the Docker image, which can lead to faster deployments and reduced resource usage.

11-12: Setting working directory and PATH in the final stage

Setting the WORKDIR to /app and updating the PATH ensures that the application runs in the correct directory with access to the virtual environment's binaries.

16-17: Copying virtual environment and application code

Copying the virtual environment and the application code into the final image is correctly handled, ensuring that all dependencies and application files are included for runtime.

20-20: Application entrypoint command remains appropriate

The CMD instruction properly starts the application using uvicorn, specifying the host and port, which aligns with the previous configuration.

coderabbitai · 2024-10-18T11:57:31Z

.github/workflows/gcp_backend.yml

 - name: Deploy to Cloud Run
 id: deploy
- uses: google-github-actions/deploy-cloudrun@v0
+ uses: google-github-actions/deploy-cloudrun@v2
 with:
 service: ${{ env.SERVICE }}
 region: ${{ env.REGION }}
 image: gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}

 # If required, use the Cloud Run url output in later steps
 - name: Show Output
- run: echo ${{ steps.deploy.outputs.url }}
+ run: echo ${{ steps.deploy.outputs.url }}
+


⚠️ Potential issue

Approve Cloud Run deployment action update and suggest checkout action update.

Updating the Cloud Run deployment action from v0 to v2 is a good practice. This ensures you're using the latest features and security improvements provided by the action.

Additionally, the static analysis tool has flagged that the actions/checkout@v2 action is outdated. To address this:

Update the checkout action to the latest version:

- uses: actions/checkout@v2 + uses: actions/checkout@v4

This will ensure you're using the most up-to-date version of the checkout action, which may include important bug fixes and performance improvements.

Committable suggestion was skipped due to low confidence.

🧰 Tools

🪛 yamllint

[warning] 79-79: too many blank lines

(1 > 0) (empty-lines)

coderabbitai · 2024-10-18T11:57:31Z

backend/Dockerfile

@@ -1,16 +1,21 @@
-FROM tiangolo/uvicorn-gunicorn:python3.11
+FROM python:3.11 AS builder


🛠️ Refactor suggestion

Consider using 'python:3.11-slim' in the builder stage for consistency and reduced image size

Since you're using python:3.11-slim in the final stage, you might consider using it in the builder stage as well to maintain consistency and potentially reduce the overall image size. Ensure that all necessary build tools and libraries are available in the slim image for successful dependency installation.

Apply this diff to implement the suggestion:

-FROM python:3.11 AS builder +FROM python:3.11-slim AS builder

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

FROM python:3.11 AS builder

FROM python:3.11-slim AS builder

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)

backend/Dockerfile (4)
6-9: Good practices for dependency management, with a minor suggestion

The approach for handling dependencies is well-structured:

Installing git for potential VCS-based dependencies.

Copying only the requirements file for better layer caching.

Using the --upgrade flag to ensure up-to-date packages.

Consider combining the git installation with other necessary build tools in a single RUN command to reduce image layers:
-RUN apt-get update && apt-get -y install git
+RUN apt-get update && apt-get -y install git build-essential && rm -rf /var/lib/apt/lists/*
This change would also clean up the apt cache, further reducing the image size.

16-16: Efficient package installation with a suggestion for documentation

The installation of additional packages is done efficiently in a single RUN command, which is a good practice for minimizing image layers. Cleaning up the apt cache also helps reduce the final image size.

Consider adding comments to explain why each of these packages (ffmpeg, curl, unzip) is necessary for the application. This will help future maintainers understand the dependencies and potentially optimize the image further if any of these become unnecessary.

Example:
# Install required packages:
# - ffmpeg: for audio/video processing
# - curl: for downloading resources
# - unzip: for extracting downloaded archives
RUN apt-get update && apt-get -y install ffmpeg curl unzip && rm -rf /var/lib/apt/lists/*
18-22: Efficient final setup with a minor optimization suggestion

The final steps of the Dockerfile are well-structured:

Copying the virtual environment from the builder stage is an efficient way to transfer dependencies.

Copying the backend code leverages Docker's layer caching.

Exposing port 8080 and using uvicorn to run the application remain unchanged and appropriate.

Consider optimizing the COPY command for the backend code:
-COPY backend/ .
+COPY backend/ ./
This explicit use of ./ as the destination ensures that the contents of the backend directory are copied directly into the current working directory (/app), which can prevent potential issues with nested directories.

1-22: Excellent optimization of the backend Docker image

This updated Dockerfile significantly improves the build process and final image:

Implements a multi-stage build, separating the build environment from the runtime environment.

Uses slim images to reduce the final image size.

Leverages virtual environments for better dependency isolation.

Optimizes layer caching through strategic ordering of commands.

Combines RUN commands to reduce the number of layers.

These changes will likely result in a smaller, more efficient Docker image and faster build times, aligning perfectly with the PR objective of optimizing the backend Docker image.

To further improve the CI/CD pipeline:

Consider implementing Docker layer caching in your CI/CD process to speed up builds even more.

Regularly update the base images and dependencies to ensure you're using the latest security patches.

Implement a vulnerability scanning step in your pipeline to catch any security issues in the dependencies or base images.
.github/workflows/gcp_backend.yml (1)
Line range hint 31-37: Great addition of environment validation!

The new step to validate the environment input is an excellent safety measure. It prevents accidental deployments to incorrect environments, which aligns well with best practices for CI/CD pipelines.

A minor suggestion to improve readability:

Consider using a case statement for better readability and easier expansion in the future:
- if [[ "${{ github.event.inputs.environment }}" != "development" && "${{ github.event.inputs.environment }}" != "prod" ]]; then
-   echo "Invalid environment: ${{ github.event.inputs.environment }}. Must be 'development' or 'prod'."
-   exit 1
- fi
+ case "${{ github.event.inputs.environment }}" in
+   development|prod)
+     ;;
+   *)
+     echo "Invalid environment: ${{ github.event.inputs.environment }}. Must be 'development' or 'prod'."
+     exit 1
+     ;;
+ esac
Also applies to: 38-38

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b4f56b9 and bfdc8c4.

📒 Files selected for processing (2)

.github/workflows/gcp_backend.yml (1 hunks)
backend/Dockerfile (1 hunks)

🧰 Additional context used

🔇 Additional comments (7)

backend/Dockerfile (2)

1-4: Excellent implementation of multi-stage build with slim images!

Great job on implementing the multi-stage build process and using slim images. This change addresses the previous suggestion and will help reduce the final image size. The creation of a virtual environment is a best practice for dependency isolation.

11-15: Well-structured final stage setup

The final stage is set up efficiently:

Using a slim Python image reduces the final image size.

Setting the working directory to /app is a good practice.

Updating the PATH ensures the virtual environment is correctly used.

These changes contribute to a more optimized and maintainable Docker image.

.github/workflows/gcp_backend.yml (5)

Line range hint 3-17: Excellent update to workflow trigger and inputs!

The change to workflow_dispatch with custom inputs for environment and branch selection is a great improvement. This allows for:

More controlled deployments through manual triggering.

Flexibility in choosing the deployment environment and branch.

Better alignment with different stages of your development lifecycle.

These changes contribute to the overall optimization of the backend deployment process, as mentioned in the PR objectives.

39-41: Excellent updates to action versions and new steps!

Updating the checkout action to v4 and Google Auth action to v2 is great for leveraging the latest features and security improvements.

The new step to delete the /opt/hostedtoolcache directory is a smart workaround for potential storage issues on GitHub-hosted runners.

Adding a separate step for logging into Google Container Registry (GCR) improves clarity and maintainability.

These changes align well with the PR objective of optimizing the backend deployment process and addressing deprecated errors.

The previous review comment about updating the checkout action has been addressed in this PR.

Also applies to: 42-42, 44-44, 48-48, 52-54

55-67: Excellent optimization of Docker build process!

The new Docker build and push process is a significant improvement:

Using docker/setup-buildx-action@v3 enables advanced features like multi-platform builds and improved caching.

The docker/build-push-action@v6 with registry caching options is a key factor in achieving the 400% faster build time mentioned in the PR objectives.

The cache-from and cache-to configurations ensure that subsequent builds can reuse layers, further optimizing the process.

These changes directly address the main goal of this PR to optimize the backend Docker image and significantly reduce build times.

70-70: Approved: Cloud Run deployment action update

Updating the Google Cloud Run deployment action from v0 to v2 is an excellent change. This ensures you're using the latest features and security improvements provided by the action. It also addresses the PR objective of resolving deprecated errors.

Line range hint 1-79: Summary: Excellent optimization and modernization of the backend deployment workflow

This PR successfully achieves its objectives:

Significantly reduces build time by implementing Docker Buildx and registry caching.

Addresses deprecated errors by updating action versions.

Improves workflow flexibility with manual triggers and environment selection.

Enhances security and reliability with environment validation and up-to-date actions.

These changes collectively result in a more efficient, secure, and maintainable deployment process for the backend service. Great work on this optimization!

beastoin · 2024-10-19T07:54:48Z

lgtm 🥳

thainguyensunya added 9 commits October 17, 2024 22:46

Use multi-stage for backend image

fcadbf2

Use multi-stage for backend image

c833019

Add required packages for run stage

57e75cf

Utilize cache for docker build in Github action

2d06f20

Workaround no space left on device of GitHub-hosted runner

23e7e33

Upgrade google-github-actions/auth to v2

13fefe3

Merge branch 'main' into optimize-backend-docker-image

0dc67b1

Fix warning of docker build

288dc7b

Fix deprecated error of google-github-actions/deploy-cloudrun by upgr…

b4f56b9

…ade from v0 to v2

coderabbitai bot reviewed Oct 18, 2024

View reviewed changes

thainguyensunya had a problem deploying to development October 18, 2024 15:55 — with GitHub Actions Failure

coderabbitai bot reviewed Oct 18, 2024

View reviewed changes

Implement fix for comments in coderabbitai

764c73a

thainguyensunya force-pushed the optimize-backend-docker-image branch from bfdc8c4 to 764c73a Compare October 18, 2024 16:03

thainguyensunya temporarily deployed to development October 18, 2024 16:04 — with GitHub Actions Inactive

thainguyensunya deployed to development October 18, 2024 16:21 — with GitHub Actions Active

beastoin merged commit 17d10b2 into main Oct 19, 2024
2 checks passed

beastoin deleted the optimize-backend-docker-image branch October 19, 2024 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize backend docker image #1103

Optimize backend docker image #1103

beastoin commented Oct 18, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 18, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

beastoin commented Oct 18, 2024

coderabbitai bot left a comment

coderabbitai bot Oct 18, 2024

coderabbitai bot Oct 18, 2024

coderabbitai bot left a comment

beastoin commented Oct 19, 2024

		@@ -1,16 +1,21 @@
		FROM tiangolo/uvicorn-gunicorn:python3.11
		FROM python:3.11 AS builder

Optimize backend docker image #1103

Optimize backend docker image #1103

Conversation

beastoin commented Oct 18, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 18, 2024 • edited Loading

Walkthrough

Changes

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

beastoin commented Oct 18, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 18, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 18, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

beastoin commented Oct 19, 2024

beastoin commented Oct 18, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 18, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)