Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starts Dockerfile + GitHub action for building and pushing #32

Merged
merged 26 commits into from
Sep 16, 2024

Conversation

gvegayon
Copy link
Member

@gvegayon gvegayon commented Sep 12, 2024

  • Currently, uses image from docker.io/gvegayon: Should replace with Azure registry (or ghcr.io better).
  • Starts two Dockerfiles; one based on rocker/geospatial:4.4.1.
  • Adds the GitHub Action workflow that:
    • Caches the Dockerfile-dependencies build + push based on the DESCRIPTION and Dockerfile-dependencies file. The resulting image is gvegayon/cfa-epinow2-pipeline-dependencies:latest
    • Using gvegayon/cfa-epinow2-pipeline-dependencies:latest as a baseline, it builds another image called gvegayon/cfa-epinow2-pipeline:latest that installs the package. The docker file is ./Dockerfile.

Copy link
Collaborator

@zsusswein zsusswein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! This is already a huge step in the right direction.

My questions:

  • Can we install the package into the container on build? I had been doing something like this to allow layer caching with code changes. Are you able to use pieces from that?
  • If yes to the above, can we run the tests with testthat::test_local() rather than R CMD check? Or is that redundant because R CMD check will have passed for the install to work?
  • Can we think a little more about when the image build should run? I think on merge to main is an obvious candidate in addition to the manual trigger.
  • Also is the geospatial image still required? Can we use something slimmer?
  • Can you give a little more context on what the Makefile is for? Local debugging? Is it used in the Action somehow?

Makefile Show resolved Hide resolved
.github/workflows/build-docker.yaml Show resolved Hide resolved
@gvegayon
Copy link
Member Author

gvegayon commented Sep 12, 2024

@zsusswein, my replies:

Can we install the package into the container on build? I had been doing something like this to allow layer caching with code changes. Are you able to use pieces from that?

Sure, I have modified it to do so. Since I am installing the dependencies via install2.r in a separate layer, that's already cached, so you don't need to do the hack of installing the package with an error.

If yes to the above, can we run the tests with testthat::test_local() rather than R CMD check? Or is that redundant because R CMD check will have passed for the install to work?

I am including the built version of the package in the image now, so you can run R CMD check after the fact. We can have R CMD check to fail, I think.

Can we think a little more about when the image build should run? I think on merge to main is an obvious candidate in addition to the manual trigger.

I see this image as just the requirements for the package. Pre-building it will save lots of time and that should only be updated if you want to get different versions of the dependencies whatsoever. I don't see the need to re-run the process whenever you update the repo.

Also is the geospatial image still required? Can we use something slimmer?

You can use something slimmer, but that would cost you time. Right now, firing up the image takes about 1 minute; and building roughly 20. Trying to use something like rocker/r-ver may take you significantly longer. A run on CDCent took over 2 hours and it didn't finalized.

Can you give a little more context on what the Makefile is for? Local debugging? Is it used in the Action somehow?

I use Makefiles because they make my life easier. Mostly for development.

@zsusswein
Copy link
Collaborator

zsusswein commented Sep 12, 2024

Since I am installing the dependencies via install2.r in a separate layer, that's already cached, so you don't need to do the hack of installing the package with an error.

Does this install2.r require us to list all of our packages again for it to work? I'd prefer a solution that doesn't require us to keep duplicate package manifests, if possible.

I am including the built version of the package in the image now, so you can run R CMD check after the fact. We can have R CMD check to fail, I think.

Great, thank you!

I see this image as just the requirements for the package. Pre-building it will save lots of time and that should only be updated if you want to get different versions of the dependencies whatsoever. I don't see the need to re-run the process whenever you update the repo.

and

You can use something slimmer, but that would cost you time. Right now, firing up the image takes about 1 minute; and building roughly 20. Trying to use something like rocker/r-ver may take you significantly longer. A run on CDCent took over 2 hours and it didn't finalized.

I'm not really following this? My understanding is that the very slow builds on the CDCent version is because that repository has a much larger codebase. The resulting image has many more dependencies and builds very slowly. In contrast, in this repo R CMD check with no caching takes ~10 mins. Isn't that about the maximum build time with no cache that we should expect?

I use Makefiles because they make my life easier. Mostly for development.

Sounds good -- was just curious!

@gvegayon
Copy link
Member Author

OK @zsusswein; the GHA workflow now has the build+push step cached as a function of the DESCRIPTION and Docker-dependencies files. Changes in either of those will trigger a build + push. The Docker action has an included caching feature, but it is a bit obscure how it works, the caching using GHA is easier, IMO.

@zsusswein zsusswein self-requested a review September 16, 2024 21:07
Copy link
Collaborator

@zsusswein zsusswein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! A couple minor things:

  • I think you dropped the run_tests job in 47bf9bc. Was that on purpose? I left comments to add it back in, but feel free to reject and remove references to it if you think it's otherwise covered.
  • I think the Makefile might need to be updated?

These are all minor questions -- trying to make sure I understand enough to maintain this once you roll onto bigger & better things! I think core functionality here is done.

.github/workflows/build-docker.yaml Outdated Show resolved Hide resolved
.github/workflows/build-docker.yaml Outdated Show resolved Hide resolved
.github/workflows/build-docker.yaml Show resolved Hide resolved
.github/workflows/build-docker.yaml Show resolved Hide resolved
Makefile Show resolved Hide resolved
@gvegayon
Copy link
Member Author

I think you dropped the run_tests job in 47bf9bc. Was that on purpose? I left comments to add it back in, but feel free to reject and remove references to it if you think it's otherwise covered.

I moved the R CMD check to be executed inside of the build process. Thus, the build process will fail if R CMD check fails as well.

I think the Makefile might need to be updated?

I think I did now!

@gvegayon gvegayon requested a review from zsusswein September 16, 2024 23:01
@zsusswein zsusswein merged commit b59ddec into CDCgov:main Sep 16, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants