Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Yarn Modern v4 #6446

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Upgrade to Yarn Modern v4 #6446

wants to merge 13 commits into from

Conversation

goplayoutside3
Copy link
Contributor

@goplayoutside3 goplayoutside3 commented Nov 6, 2024

Linked Issue and/or Talk Post

Yarn Migration docs: https://yarnpkg.com/migration/guide
Toward: #5970
Closes: #1470
Closes: #6391

Describe your changes

  • Upgrade yarn version to v4
  • Migrate --frozen-lockfile command to --immutable per the migration guide.
  • Set enableScripts: false in the yarn config. This replaces the legacy --ignore-scripts flag.
  • Update outdated docs in the root Readme.
  • Update root .gitignore and remove some unrelated rules from it.

How to Review

There's a lot of changes required to use Yarn Modern, so please see my comments below on individual file changes.

I deployed this branch to the fe-project-branch and verified a successful deploy and I could see a project at that url. I might overwrite fe-project-branch with a different PR, but the successful deploy GH Action is linked here.

To review:

  • Download this branch and yarn panic and yarn bootstrap.
  • Try building a library such as lib-user.
  • Try running Storybook for a library.
  • Try building and running the Next.js apps.

Checklist

PR Creator - Please cater the checklist to fit the review needed for your code changes.
PR Reviewer - Use the checklist during your review. Each point should be checkmarked or discussed before PR approval.

General

  • Tests are passing locally and on Github
  • Documentation is up to date and changelog has been updated if appropriate
  • You can yarn panic && yarn bootstrap or docker-compose up --build and FEM works as expected
  • FEM works in all major desktop browsers: Firefox, Chrome, Edge, Safari (Use Browserstack account as needed)
  • FEM works in a mobile browser

General UX

Example Staging Project: i-fancy-cats

  • All pages of a FEM project load: Home Page, Classify Page, and About Pages
  • Can submit a classification
  • Can sign-in and sign-out

Maintenance

  • If not from dependabot, the PR creator has described the update (major, minor, or patch version, changelog)

@coveralls
Copy link

coveralls commented Nov 6, 2024

Coverage Status

coverage: 77.788% (-0.01%) from 77.802%
when pulling 1960282 on yarn-modern
into 7a45d58 on master.

- run: yarn install --production=false --frozen-lockfile --ignore-scripts
- run: yarn install --immutable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--frozen-lockfile is replaced by --immutable. This means the install will abort with an error exit code if the lockfile was to be modified by installing third party libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--ignore-scripts is replaced by enableScripts: false in .yarnrc.yml. This disables post-installation scripts from third party libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--production=false was originally used to ensure devDeps were not installed during ci-builds. However, it's not a flag that applies to Yarn Modern, and Yarn will not install any package listed in devDependencies if the NODE_ENV environment variable is set to production, which is the case in many of our build scripts anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that --immutable always installs all dependencies. An install can't be limited to production dependencies without changing the lockfile: yarnpkg/berry#2253 (comment)

--production=false should have ensured that devDeps were installed, though. Otherwise the subsequent build would fail, without the build tools.

# Based on https://github.com/github/gitignore/blob/master/Node.gitignore
# Yarn https://yarnpkg.com/getting-started/qa#which-files-should-be-gitignored
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This root .gitignore file was generated years ago and contained a lot of "extras" that don't apply to FEM. I've cleaned it up and included the recommended Yarn files.

@@ -1,9 +1,27 @@
# Based on https://github.com/github/gitignore/blob/master/Node.gitignore

# https://yarnpkg.com/getting-started/qa#which-files-should-be-gitignored
Copy link
Contributor Author

@goplayoutside3 goplayoutside3 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment.

@@ -0,0 +1,3 @@
nodeLinker: node-modules
yarnPath: .yarn/releases/yarn-4.5.1.cjs
enableScripts: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs for this are here: https://yarnpkg.com/configuration/yarnrc#enableScripts. We don't want to run post-install scripts from third party libraries, and this config replaces --ignore-scripts from Yarn Classic.

Comment on lines -3 to -12
[![Build Status](https://travis-ci.com/zooniverse/front-end-monorepo.svg?branch=master)](https://travis-ci.com/zooniverse/front-end-monorepo)
[![Coverage Status](https://coveralls.io/repos/github/zooniverse/front-end-monorepo/badge.svg?branch=master)](https://coveralls.io/github/zooniverse/front-end-monorepo?branch=master)
[![pullreminders](https://pullreminders.com/badge.svg)](https://pullreminders.com?ref=badge)

[![lerna](https://img.shields.io/badge/maintained%20with-lerna-cc00ff.svg)](https://lernajs.io/)
[![Licensed under Apache 2.0](https://img.shields.io/github/license/zooniverse/front-end-monorepo.svg)](https://github.com/zooniverse/front-end-monorepo/blob/master/LICENSE.md)
![Contributors](https://img.shields.io/github/contributors/zooniverse/front-end-monorepo.svg)

️Take a look at [our roadmap](https://trello.com/b/yg0r4dG5/front-end-rebuild-roadmap)! 🛣️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deleted links no longer exist.

@@ -39,7 +34,7 @@ Node, git, and yarn can be installed through [homebrew](https://brew.sh/) on Mac

This monorepo is managed with [Yarn Workspaces](https://yarnpkg.com/lang/en/docs/workspaces/).

Yarn Workspaces allow us to maintain package modularity for javascript projects that have interdependency. Organizationally, they allows us to track issues, pull requests, and progress for all related packages in one place.
Yarn Workspaces allow us to maintain package modularity for javascript projects that have interdependency. Organizationally, they allows us to track issues, pull requests, and progress for all related packages in one place. See documentation for [Corepack](https://yarnpkg.com/corepack), too.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to use Corepack with Yarn Modern.

Comment on lines +18 to +20
printf 'Installing dependencies...\n'
yarn install --immutable
printf '\n'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same concept as #6393. Install once, and then build the rest of the packages. This brings the bootstrap file up to date with the CI build scripts.

"node": ">=16.18"
"node": ">=20.5"
Copy link
Contributor Author

@goplayoutside3 goplayoutside3 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to match package.json in the individual workspaces. I don't think this one matters much because the ci builds reference the files in .github, and each package has its own package.json.

Comment on lines 51 to +52
"devDependencies": {
"@babel/cli": "^7.25.9",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to install @babel/cli in a couple of our libraries because they use the babel command in the build scripts.

@goplayoutside3 goplayoutside3 marked this pull request as ready for review November 6, 2024 22:59
@goplayoutside3 goplayoutside3 requested a review from a team November 6, 2024 23:00
@shaunanoordin shaunanoordin self-assigned this Nov 7, 2024
@shaunanoordin shaunanoordin self-requested a review November 7, 2024 13:52
@shaunanoordin
Copy link
Member

shaunanoordin commented Nov 7, 2024

Review (attempt) currently in progress.

PSA: you may need to run "yarn install" first.

When I switch to this branch and try running yarn panic OR yarn bootstrap, I received this error:

> yarn panic
Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)
> yarn bootstrap
Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)
(Full version)
> yarn --version
4.5.1
> git checkout yarn-modern
Switched to branch 'yarn-modern'
Your branch is up to date with 'origin/yarn-modern'.
> yarn panic
Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)

$ yarn run [--inspect] [--inspect-brk] [-T,--top-level] [-B,--binaries-only] [--require #0] <scriptName> ...

❗ It appears that I first need to run yarn install once before I can run yarn bootstrap

Additional notes:

  • Once I've run yarn install once, I can run yarn bootstrap as many times as I like.
  • if I run yarn panic, I ALSO need to re-run yarn install before I can then run yarn bootstrap
  • this is the first time I've run 'yarn install' for FEM since... I have no idea when. It's definitely new to this yarn-modern branch, as I was able to run yarn panic and/or yarn boostrap back-to-back, willy-nilly, on the master branch.
  • ⚠️ if yarn install becomes a required bit before we run yarn bootstrap, then we'll need to update the README to include that as part of the Getting Started process.
    • I'll run a separate git clone to confirm this. Watch this space.
    • Update: CONFIRMED. We need to update the README.
  • ❓ I haven't tested anything with Docker yet.
Additional debugging notes
  • I'm running yarn 4.5.1
  • I think I have the Yarn Corepack installed, as yarn exec env returns a bunch of things:
PREFIX=/usr/local
NVM_INC=/Users/shaun/.nvm/versions/node/v20.9.0/include/node
TERM_PROGRAM=Apple_Terminal
NVM_CD_FLAGS=-q
SHELL=/bin/zsh
...

@shaunanoordin
Copy link
Member

shaunanoordin commented Nov 7, 2024

PR Review (ongoing)

Manual testing update

  • ✅ Local dev server: I'm able run app-project locally
    • yarn install ; yarn bootstrap ; cd packages/app-project ; yarn dev
    • Open random test project, https://local.zooniverse.org:3000/projects/darkeshard/test-project-2021
    • Choose a random workflow to classify on, and submit.
  • ⏸️ Docker: docker-compose build has been running for the past 20 minutes and it's not halfway done. 💀 I may have to give up on this.

Copy link
Member

@shaunanoordin shaunanoordin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Packages: (root) / (everything)

This PR updates the monorepo to use Yarn v4.

  • This is purely a dev-side/build-side update. No features are added/removed/changed, and the development & build processes themselves should look basically the same.

Code changes look fine, as far as I can tell, and basic testing indicates that most major packages build & run fine locally. There's a new caveat about running yarn install, but see the 'Status' section at the end for full details.

Testing (CLI)

Basic pass/fail criteria for this review is "can I run some of FEM's major packages on the local dev server?"

Base setup:

Testing (app-project):

✅ Tests look good. app-project appears to work on local.

Testing (Docker)

Basic pass/fail criteria for this review is the same as above.

❓ Tests look odd - the server runs, but on HTTP (https is blocked), and all I'm getting are 404 pages. Which means the local dev server is running and the app-project code is there, but there's probably something wonky in a config, or my test methods need revision. (I rarely, if ever, use Docker for local dev. I'm only testing this to be thorough.)

🤷 I'm marking this as a "OK, this is fine, I guess" since I'm not too worried about Docker, but we may need to revisit this if another developer uses docker as their main development method.

Testing (CLI, Tabula Rasa Edition)

Basic pass/fail criteria for this review is "can I run some of FEM's major packages on the local dev server, if I install the Front End Monorepo from scratch ?"

Tabula Rasa Policy:

  • delete existing /front-end-monorepo folder, or move it elsewhere.
  • Install a fresh copy of FEM: git clone [email protected]:zooniverse/front-end-monorepo.git

Testing (app-project):

  • Proceed with the same steps as Testing (CLI) above.

✅ Tests still look good. This tabula rasa test confirms the necessity of the 🆕 yarn install step during initialisation.

Status

Changes LGTM, but I'd like to highlight a new "required step" on the dev side that seems to have appeared as a result of the Yarn 4 update.

❗ 🆕 yarn install is now required before running yarn scripts

  • Information for Zooniverse devs: when switching to this branch for the first time (and after this PR is merged), you may need to run yarn install, at least once, before running yarn bootstrap.
    • Once installed, you can run yarn bootstrap with no issue, as many times as you like.
    • You also need to run yarn install once after running yarn panic.
    • You know we need to run yarn install when: you attempt to run yarn panic or yarn bootstrap, but instead see the error message: "Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)"
  • Actions to be taken: The README needs to be updated with this information. (@goplayoutside3 : this can be part of this PR, or I can make the doc update a separate PR - whatever seems cleaner.)
Documentation Update

For https://github.com/zooniverse/front-end-monorepo/blob/860f09fc57890bdc6a4f0fb7706b30c8ab01d186/README.md#debugging-the-production-release :

### Debugging the production release

Make sure you have pulled the latest production version.

"""sh <== change these to tildes
git pull --tags -f
""" <== change these to tildes

Check out the latest production release.

"""sh <== change these to tildes
git checkout production-release
""" <== change these to tildes

// NEW
You'll need to run yarn install _once_ for the next yarn script to work. Otherwise, you may encounter the error message, _"Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)"_

// NEW
"""sh <== change these to tildes
yarn install
""" <== change these to tildes

Run the bootstrap script to build all the libraries and apps. You can use `bootstrap:es6` here for a faster build if you don't want to run the tests.

"""sh <== change these to tildes
yarn bootstrap
""" <== change these to tildes

For https://github.com/zooniverse/front-end-monorepo/blob/860f09fc57890bdc6a4f0fb7706b30c8ab01d186/README.md#with-node-and-yarn :

### With Node and yarn

Alternatively, you can install Node 20 and yarn and build the monorepo packages.

"""sh <== change these to tildes
git clone [email protected]:zooniverse/front-end-monorepo.git
cd front-end-monorepo
yarn install  <== NEW
yarn bootstrap
""" <== change these to tildes

The `bootstrap` script will install the dependencies and build any local packages used as dependencies.

// NEW
`yarn install` is required to be run _once_ for any yarn script to work. If you see an error message like _"Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)",_ it means you haven't run `yarn install` yet.

@github-actions github-actions bot added the approved This PR is approved for merging label Nov 7, 2024
@eatyourgreens
Copy link
Contributor

eatyourgreens commented Nov 7, 2024

❓ Tests look odd - the server runs, but on HTTP (https is blocked), and all I'm getting are 404 pages. Which means the local dev server is running and the app-project code is there, but there's probably something wonky in a config, or my test methods need revision. (I rarely, if ever, use Docker for local dev. I'm only testing this to be thorough.)

The Docker build is a production build, so staging projects will 404, if that helps.

args:
- NODE_ENV=production
- PANOPTES_ENV=production
- NEXT_TELEMETRY_DISABLED=1
- APP_ENV=development

@goplayoutside3
Copy link
Contributor Author

goplayoutside3 commented Nov 14, 2024

Thanks! That's helpful to know.

I want to continue a discussion here that was started in #6465 (that PR is about to be merged)

@eatyourgreens you mentioned the size of the FEM image.

node_modules includes development dependencies, despite the --production flag on the yarn install. That might be a yarn bug.

In this PR, --production is not a valid flag for Yarn Modern's yarn install. Do you happen to know where to investigate eliminating devDep in the production image? For Yarn Modern, the migration guide notes that starting in Yarn v2:

One particular note is that you cannot install production dependencies only at the moment. We plan to add back this feature at a later time, but given that enabling Zero-Installs would cause your repository to contain all your packages anyway (prod & dev), this feature isn't deemed as important as it used to be.

😞

Might need to look into yarn workspaces focus.

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Nov 14, 2024

I think Yarn 4 will automatically ignore dev dependencies, as long as NODE_ENV is production. So, in theory, the deployed image should only have production deps in node_modules.

The build is a multi-stage build, with two stages:

  1. builder: A full bootstrap of the monorepo, including dev dependencies, in a Node 20 image. This builds all the libraries and apps.
  2. runner: Copy the built packages directory over to a fresh Node 20 image. yarn install only the production dependencies needed to run the Next.js apps with yarn start (no storybook, tests, webpack etc.) This is the image that gets deployed. In theory, it should be much smaller than a full development install.

@eatyourgreens
Copy link
Contributor

I could also be totally wrong about Yarn 4 production installs and NODE_ENV.

@goplayoutside3
Copy link
Contributor Author

as long as NODE_ENV is production. So, in theory, the deployed image should only have production deps in node_modules.

I think you're right about this because I remember reading about it when first creating this PR, but now I can't find it documented 🤔 Will continue to investigate...

@eatyourgreens
Copy link
Contributor

I'm also happy to answer any other questions that come up about the Docker builds, (if I know the answers.)

@goplayoutside3
Copy link
Contributor Author

goplayoutside3 commented Nov 14, 2024

Ugh using Docker Desktop to build an image of this branch is even larger 🫠 This screenshot is from front-end-monorepo_prod:latest when I use docker compose build locally.
docker command

And strangely enough, with docker images, the dev image is smaller than the prod image

front-end-monorepo_prod   latest    d3254c6463cc   18 minutes ago   2.81GB
front-end-monorepo_dev    latest    80bcdda63fd7   19 minutes ago   1.94GB

@eatyourgreens
Copy link
Contributor

This looks more like what I'd expect from node_modules in production. It's still big, but not 1.6GB. I'll open a PR for this.

du -sh * inside a production build of the monorepo.

jimodonnell@Jims-MacBook-Pro front-end-monorepo % docker compose run --rm prod-shell
[+] Creating 1/0
 ✔ Network front-end-monorepo_default  Created                                                                                                      0.0s 
/usr/src # du -sh *
790.3M	node_modules
4.0K	package.json
641.7M	packages
824.0K	yarn.lock
/usr/src # 

@eatyourgreens
Copy link
Contributor

I'm not sure why Yarn 4 is creating 2GB builds. I don't think those would break deploys, but they'd definitely slow down uploads/downloads to GitHub Container Repository. I don't know if GHCR has any usage limits either. I vaguely remember that, about a year ago, the monorepo ran into errors in GitHub Actions because the production images got really large.

@eatyourgreens
Copy link
Contributor

@goplayoutside3 I've opened #6469 to try and make it easier to debug the production build. One thing I noticed from the old docker compose setup: node_modules is mounted from your local machine. I removed that so that node_modules inside each image contains only what's installed by the build. The only case I could think of for mounting a local volume was to do local development in Docker, but you can't do development work with yarn start anyway.

volumes:
- node_modules:/usr/src/node_modules

@eatyourgreens
Copy link
Contributor

Here's a recent tutorial for multi-stage Docker builds, including examples for Next.js.

https://labs.iximiuz.com/tutorials/docker-multi-stage-builds

@goplayoutside3
Copy link
Contributor Author

#6469 is now included in this PR, and I can successfully deploy this branch.

By using Docker Desktop and this branch locally, I can see that Storybook is the biggest contributor to both front-end-monorepo_node_modules-prod and front-end-monorepo_node_modules-dev, so it seems like Yarn is not differentiating between a dev and prod env.

The prod build is still way larger than the dev build, and I don't want to merge this Yarn upgrade until I understand why that is. I can list the Dockerfile commands that contribute the largest file transfers, but I've reached my Docker x Yarn skills limit. I'll need to research and come back to this PR.

front-end-monorepo % docker images
REPOSITORY                TAG       IMAGE ID       CREATED         SIZE
front-end-monorepo_prod   latest    f44c23ed2aae   5 minutes ago   2.83GB
front-end-monorepo_dev    latest    857ba8959a0c   8 minutes ago   1.96GB
  • Dev build

    • COPY ./packages /usr/src/packages adds only 15MB
    • ... YARN_CACHE_FOLDER=/home/node/.yarn yarn install --immutable adds 1.11GB
    • ... YARN_CACHE_FOLDER=/home/node/.yarn yarn workspace @zooniverse/fe-project build adds 344.15MB
    • ... YARN_CACHE_FOLDER=/home/node/.yarn yarn workspace @zooniverse/fe-root build adds 312.16MG
  • Prod build

    • COPY /usr/src/packages ./packages adds 671MB
    • ... YARN_CACHE_FOLDER=/home/node/.yarn yarn install --immutable adds 2GB
    • The yarn workspace @zooniverse/[app] build commands are not in the prod build 🤔

@eatyourgreens
Copy link
Contributor

Prod build

* `COPY /usr/src/packages ./packages` adds 671MB

This is 344MB for the project app, 312MB for the root app and ~15MB for all the other libraries.

* The `yarn workspace @zooniverse/[app] build` commands are not in the prod build 🤔

The runner image copies the built apps over from the builder image (basically the .next/ directories for each app.)

I found a useful post, which goes through how to deploy standalone Next.js builds with Docker. It would need to be modified to work with a monorepo, but should give smaller images.

@eatyourgreens
Copy link
Contributor

With Yarn 4, you can also look for this in the build output (which tells you that the production build is installing dev dependencies 😞 .)

YN0000: ┌ Fetch step                                                                                                                         
 => => # ➤ YN0013: │ 2413 packages were added to the project (+ 866.81 MiB).                                                                             
 => => # ➤ YN0000: └ Completed in 29s 875ms 

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Nov 26, 2024

After you've deployed this branch, does the image run?

docker compose up gives me the same yarn install error that Shaun mentioned, on my laptop.

jimodonnell@Jims-MacBook-Pro front-end-monorepo % docker-compose up
[+] Running 5/5
 ✔ Network front-end-monorepo_default         Created                                                                                               0.1s 
 ✔ Container front-end-monorepo-fe-project-1  Created                                                                                               0.5s 
 ✔ Container front-end-monorepo-prod-shell-1  Created                                                                                               0.5s 
 ✔ Container front-end-monorepo-dev-shell-1   Created                                                                                               0.5s 
 ✔ Container front-end-monorepo-fe-root-1     Created                                                                                               0.5s 
Attaching to dev-shell-1, fe-project-1, fe-root-1, prod-shell-1
prod-shell-1 exited with code 0
dev-shell-1 exited with code 0
fe-root-1     | Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)
fe-root-1     | 
fe-root-1     | $ yarn run [--inspect] [--inspect-brk] [-T,--top-level] [-B,--binaries-only] [--require #0] <scriptName> ...
fe-project-1  | Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)
fe-project-1  | 
fe-project-1  | $ yarn run [--inspect] [--inspect-brk] [-T,--top-level] [-B,--binaries-only] [--require #0] <scriptName> ...
fe-root-1 exited with code 1
fe-project-1 exited with code 1
jimodonnell@Jims-MacBook-Pro front-end-monorepo % 

I can also reproduce that error by manualling running yarn start inside the production container.

jimodonnell@Jims-MacBook-Pro front-end-monorepo % docker-compose run --rm prod-shell
/usr/src # ls
node_modules  package.json  packages      yarn.lock
/usr/src # yarn start
Usage Error: Couldn't find the node_modules state file - running an install might help (findPackageLocation)

$ yarn run [--inspect] [--inspect-brk] [-T,--top-level] [-B,--binaries-only] [--require #0] <scriptName> ...
/usr/src # 

EDIT: seems to be my local setup.

@goplayoutside3
Copy link
Contributor Author

I found a useful post, which goes through how to deploy standalone Next.js builds with Docker. It would need to be modified to work with a monorepo, but should give smaller images.

Thanks I'll take a look!

@goplayoutside3
Copy link
Contributor Author

goplayoutside3 commented Dec 4, 2024

Zach and I chatted about this upgrade today in the office. We discussed reasons why there's unexpected Docker image size differences between the master branch and this branch, as well as why the front-end-monorepo_prod image is almost 1GB larger than front-end-monorepo_dev.

One of the possible causes is that yarn install is being executed with the wrong version of Yarn, and the new flags for yarn install aren't recognized. However, the possibility of this scenario is low after inspecting the Yarn version in /home/node/.yarn in /usr/src.

@eatyourgreens it would be helpful to know if you have some legacy knowledge about why the Dockerfile RUN commands include the following flags: --mount=type=cache,id=fem-builder-yarn,uid=1000,gid=1000,target=/home/node/.yarn YARN_CACHE_FOLDER=/home/node/.yarn. Why are they required for yarn install?

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Dec 4, 2024

Those are mounting a build cache. See #2309

This should be the docs for the --mount flag: https://docs.docker.com/engine/storage/bind-mounts/

Basically, those flags should be caching /home/node/.yarn across builds.

This Stack Overflow answer has some examples of Buildkit's --mount=type=cache.

https://stackoverflow.com/questions/69464184/using-docker-buildkit-mount-type-cache-for-caching-nuget-packages-for-net-5-d#69469767

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Dec 4, 2024

I could also be totally wrong about Yarn 4 production installs and NODE_ENV.

I was, in fact, completely wrong. Yarn 4 has no concept of an immutable production install. The first reply here has a detailed (though annoying) explanation.

yarnpkg/berry#2253

This feature request talks about how production installs might work in Yarn 4.
yarnpkg/berry#6316

@eatyourgreens
Copy link
Contributor

Here's another explanation of cache mounts, taken from #2309.

https://vsupalov.com/docker-buildkit-features/#using-a-cache-mount

@eatyourgreens
Copy link
Contributor

One of the possible causes is that yarn install is being executed with the wrong version of Yarn, and the new flags for yarn install aren't recognized. However, the possibility of this scenario is low after inspecting the Yarn version in /home/node/.yarn in /usr/src.

You can check this by looking at the logs for a build on this branch.

https://github.com/zooniverse/front-end-monorepo/actions/runs/12019431323/job/33505979000

The yarn install for the production image is at line 518.

Build logs for an install with yarn 4.5.1.

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Dec 4, 2024

~~TBH, /home/node/.yarn isn’t needed to run the apps, only to install dependencies. So that can be removed from the production image. It’s probably very big, since Yarn is installing ~2400 packages.~~ The yarn cache isn't included in the final image, either dev or production. I should have read the notes for #2309 before writing that.

#2351 is also useful for understanding how the Docker builds are using BuildKit, and caching.

@eatyourgreens
Copy link
Contributor

We're running free software engineering training for Oxford Physics in Jan/Feb 2025, including a day on Docker and containers. If anyone from Zooniverse is interested, there should be a full list on https://train.rse.ox.ac.uk.

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Dec 5, 2024

I've rewritten the Dockerfile, in #6515, to use the base image suggestion from Node.js 20, Yarn 4, and Next.js on Docker (and fix #6513.)

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Dec 5, 2024

@goplayoutside3 here's the documentation for YARN_CACHE_FOLDER in Yarn 1. I don't know if that's still used by Yarn 4.

Yarn 1 uses a global cache, but I think Yarn 4 handles caching in a completely different manner, with local caching being the default, not global.

The default location of the Yarn 1 global cache depends on the OS. This is a monorepo build with Yarn 1.22, on Alpine Linux:

jimodonnell@Jims-MBP-2 front-end-monorepo % docker compose run --rm dev-shell
[+] Creating 1/0
 ✔ Network front-end-monorepo_default  Created                                                                                                                   0.0s 
/usr/src $ yarn cache dir
/home/node/.cache/yarn/v6

On MacOS, the Yarn cache dir will probably be in your user directory soemwhere.

jimodonnell@Jims-MBP-2 front-end-monorepo % yarn cache dir
/Users/jimodonnell/Library/Caches/Yarn/v6

With Yarn 4, the default cache directory is ./.yarn/cache, so the cache mounts should cache that directory.
https://yarnpkg.com/configuration/yarnrc#cacheFolder

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Dec 6, 2024

Here's a link to the BuildKit docs, via zooniverse/talk-api#271. BuildKit was experimental when the original PRs were written, but it's now the default Docker build system.

https://docs.docker.com/build/buildkit/

I'm not 100% sure if BuildKit cache mounts are supported by GitHub Actions. Zach probably knows more about this than me. If not, it might be worth reaching out to Cam for help.

You can see another example of cache mounts in the Talk API here:
https://github.com/zooniverse/talk-api/blob/a493117878ee43af80cdafb45dae5bf22d5bfa2e/Dockerfile#L7-L13

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Dec 6, 2024

Found this in the Docker docs:

BuildKit doesn't preserve cache mounts in the GitHub Actions cache by default.

https://docs.docker.com/build/ci/github-actions/cache/#cache-mounts

So I would guess those cache mounts aren't being used in CI ie. Yarn is making ~2400 HTTP requests to download packages for every GitHub Actions build?

My best guess for the increased image size with Yarn 4 would be downloaded packages in ./.yarn/cache being included in the final image. It might be worth checking how big the Yarn cache is after an install.

There's also a note in the Yarn 4 docs about committing the cache to Git, for reproducible builds.

Yarn provides a very simple but effective alternative: by setting enableGlobalCache to false, it will save the package cache into a folder local to your project (by default .yarn/cache) that can then be added to Git. Every given commit is thus guaranteed to be installable, even should the npm registry go under.

https://yarnpkg.com/features/caching

If the Yarn cache were checked out with the Git repo, then the build won't need cache mounts.

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Jan 4, 2025

Just following up on this to say that the details of our Docker training, including how to read and write Docker files, are up now:

https://train.rse.ox.ac.uk/event/29

Oxford Zooniverse team members can register for free via https://www.rse.ox.ac.uk/training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved This PR is approved for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

yarn bootstrap errors on MacOS while building the canvas dependency Yarn 2 upgrade
4 participants