Skip to content
This repository has been archived by the owner on Sep 11, 2024. It is now read-only.

The repository is way too big #524

Open
MatthewL246 opened this issue Sep 22, 2023 · 10 comments
Open

The repository is way too big #524

MatthewL246 opened this issue Sep 22, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@MatthewL246
Copy link

MatthewL246 commented Sep 22, 2023

Git is very inefficient at storing binary files, and the 70MB TeamCode apk being uploaded for every commit is causing problems. We should be hosting it somewhere else outside the repository, like in a release maybe. The repository is over 1 GB, which is over GitHub's size recommendations (and if it ever gets over 5gb, expect a complaint from GitHub support).

This is slowing down cloning by a lot and git operations in general. I disabled the apk commit function because the size issue just keeps getting worse. We could use something like git-filter-repo to nuke the apk from the entire history and permanently fix the size issue, but this would require everyone to delete their clone of the repo and reclone due to editing history.

  • Changing the repo history means we need to coordinate everyone to save their work and re-clone the repo after all the branches are force-pushed
  • This will break all of the commit verified signatures, which is probably fine
  • This also will delete everything tied to specific commits, including comments and Actions runs
  • In my repo https://github.com/MatthewL246/FtcRobotController-smol, running the command git filter-repo --invert-paths --path "HelpPage/apk/bin/TeamCode-Debug.apk" deleted 134 commits by the GitHub Actions bot
  • All branches and tags must be forced-pushed for this to work
  • This may not even immediately fix the repo size problem because GitHub does not automatically garbage-collect git data, according to this StackOverflow question
MatthewL246 added a commit that referenced this issue Sep 22, 2023
@MatthewL246 MatthewL246 self-assigned this Sep 22, 2023
@MatthewL246
Copy link
Author

See https://github.com/MatthewL246/FtcRobotController-smol for the output of git-filter-repo removing the apk. The .git directory is only 48 MB without all the junk.

@tom-ricci
Copy link

ugh i knew this would happen sorry about that

you can create a new repo and link it to this one via a submodule, and then in the github action push to the submodule repo. that'll save you a few years

i also highly recommend removing the apk from the history because qualcomm had an issue with ftc repo size in the past (that's why the repo we forked from only goes back to sdk version 7) and they'll probably hit the 5gb limit again

(only remove the apk though, don't remove the whole history. im kinda banking on this repo having 2000 commits because im telling faang recruiters about having led this project to get internships lol)

@MatthewL246
Copy link
Author

I don't think there's a point in saving the apk history since we have the source code, so I was thinking I could make the workflow update a pre-release tag that always has the latest apk version (#526).

@MatthewL246
Copy link
Author

MatthewL246 commented Sep 22, 2023

Update:

  • Repo before cleaning (fresh clone): 1.1GB
  • Repo after cleaning and force-pushing all branches: 625MB
  • Repo after cleaning and force-pushing all branches and tags: 78MB

This means our repo was 93% apk. Also, it doesn't make the GitHub API repo size smaller, but clones are so much faster.

Commands used (must be run on Linux, Codespaces works perfectly):

git clone https://github.com/MatthewL246/FtcRobotController-smol.git
cd FtcRobotController-smol
du -hs .git
curl -L https://github.com/newren/git-filter-repo/releases/download/v2.38.0/git-filter-repo-2.38.0.tar.xz -O
tar -xf git-filter-repo-2.38.0.tar.xz
rm git-filter-repo-2.38.0.tar.xz 
mv git-filter-repo-2.38.0/ ..
../git-filter-repo-2.38.0/git-filter-repo --invert-paths --path HelpPage/apk/bin/TeamCode-debug.apk --force
du -hs .git
git config push.autosetupremote true
git remote add origin https://github.com/MatthewL246/FtcRobotController-smol.git
git push --force --all
git push --force --tags

@tom-ricci
Copy link

I don't think there's a point in saving the apk history since we have the source code, so I was thinking I could make the workflow update a pre-release tag that always has the latest apk version (#526).

Workflow artifacts are likely better for this scenario. Am not completely sure how their size is limited, but it should be possible to delete or mutate artifacts to prevent running into size limit issues. I would recommend against releases because they're useful for tracking library releases, and I know there's at least one library in this repository that has published releases.
Also, they're useful for keeping track of doc versions. Each version of the repo warrants new docs, and then you can archive old docs for old versions like I did for 1.0 and 1.1

@MatthewL246
Copy link
Author

MatthewL246 commented Sep 23, 2023

The problem with artifacts is that you need to be logged-in to GitHub to download them. I created a script in https://github.com/XaverianTeamRobotics/FtcRobotController/issues/526#issuecomment-1732150026 that could be embedded in the docs website, but it would require someone to generate a personal access token with actions:read permissions for this repo and post it publicly. I do not know exactly what the risks involved in this are - I don't think actions:read allows anyone to do anything malicious to the repo, but I would be worried about someone using it to spam the GitHub API and potentially get the account owner in trouble.

Maybe this should be a job for @lasagnadmin? (I'm not 100% sure how that account works because I wasn't there when it was created, is it a bot or a regular user? And who can generate a personal access token for it? @michaell4438)

Options:

  • Make the download script just redirect to the latest Actions run page. The apk artifact will only be downloadable if the user is signed-in to GitHub. I am not sure whether this is okay for our use case.
  • Keep the download script as it is and take the security risk of a public access token.
  • Create an Action that downloads the APK artifact and uploads it as a release asset. This could use a tag name of something like "ci-build" and be deleted and re-created on the latest commit on every run so that there would only need to be 1 CI build release.

By the way, Actions artifact size limits are not a problem because they are unlimited for public repos, but they are deleted after 90 days. See the docs.

@tom-ricci
Copy link

  1. Lasagnadmin is the right account for this, and it's a normal user account. It's the team's connection to Cloudflare for hosting the public site and serverless. Anyone who's worked in the IT office knows the password to both it and our Cloudflare account. If you want, Michael can connect the team CF to your CF (or I can if he doesn't know how). It'll be especially useful for you I think since you do a lot of the team's DevOps.
  2. You don't want to expose a read token. This is very bad.
  3. When I made Lasagnadmin, I made sure to claim xbhs.pages.dev and xbhs.workers.dev so we can use CF pages, functions, and workers. You could make a worker like archive.xbhs.workers.dev, save a read token in Workers KV or R2, and use that to fetch the archive and send it to the user.
  4. Also, are you sure release binaries are mutable? Because release tags are not.

@MatthewL246
Copy link
Author

Thanks, that's all really good to know. Hosting an apk downloader on CF could definitely be a better, safer option - I didn’t realize we were already using CF.

First, though, I'd like to try creating an action for the CI release. I know that release assets are not mutable, but I think a release can be deleted and recreated with the API, and tags can be force-pushed. I think someone else has probably done it already tbh.

I'll test on my fork ofc to prevent master commit spam.

@MatthewL246
Copy link
Author

Alright, I implemented the APK release in #528.

@MatthewL246
Copy link
Author

MatthewL246 commented Sep 25, 2023

Update: it sounds like this is going to be shelved until the end of the season in May.

If I'm not available, these are the commands that need to be run in a Codespace in this repository:

git clone https://github.com/XaverianTeamRobotics/FtcRobotController.git
cd FtcRobotController
du -hs .git
curl -L https://github.com/newren/git-filter-repo/releases/download/v2.38.0/git-filter-repo-2.38.0.tar.xz -O
tar -xf git-filter-repo-2.38.0.tar.xz
rm git-filter-repo-2.38.0.tar.xz 
mv git-filter-repo-2.38.0/ ..
../git-filter-repo-2.38.0/git-filter-repo --invert-paths --path HelpPage --force
du -hs .git
git config push.autosetupremote true
git remote add origin https://github.com/XaverianTeamRobotics/FtcRobotController.git
git push --force --all
git push --force --tags

Also, the branch protection rules will need to be temporarily disabled to do this.

@michaell4438 michaell4438 added the enhancement New feature or request label Nov 20, 2023
MatthewL246 added a commit that referenced this issue Dec 8, 2023
This directory should also be permanently deleted in #524 because it used to contain large binary files.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants