-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does this action save Bazel's built output artifacts to a cache? #18
Comments
No, there should be no need to use actions/cache if you use setup-bazel. Once you enable A common problem would be to use the same disk cache for different jobs/workflows so the last one that runs would override the disk cache of the others. This can be easily solved via https://github.com/bazel-contrib/setup-bazel#separate-disk-caches-between-workflows. |
Thank you explaining that. Now I think I was misinterpreting the problem. As I understand the current design, the cache name suffix for the disk cache is the hash of all of my BUILD and BUILD.bazel files. If my BUILD.bazel files settle down but I keep on changing, say, my Go files in each set of new Git commits pushed to my pull request's branch, then setup-bazel finds an existing cache for this set of BUILD.bazel files, restores it, and doesn't bother saving a new cache that then reflects the built artifacts from the current Go files. If that's true, then I'll keep on starting out with the same restored cache even as my source files drift away from their prior state when we last changed our BUILD.bazel files. I recognize that an alternate approach of saving a new cache for each distinct Git commit is also expensive; it winds up taking a very long time to save the cache, and you wind up grinding through your cache quota quickly, evicting older and still-useful caches that are used for different purposes. Is there an approach between these positions that you have considered? Do you see the current approach as having the liability I've described here, or am I using it incorrectly and suffering unduly? |
I am honestly not sure what would be the best approach here. We could potentially list out the tree of disk cache and upload individual pieces (or folders):
However, it would probably be too network-heavy. It's also not clear how such a cache should be cleared to avoid overgrowing it. Another approach would be to force-overwrite the cache even when it's been hit. It could prove useful when running on the main branch, as PRs to the main branch could fetch it but not save it: - uses: bazel-contrib/[email protected]
with:
disk-cache: true
disk-cache-always-save: ${{ github.ref == 'refs/heads/main' }} |
An approach that I used with a previous project overlapped with these ideas. Out on topic branches for PRs, I used actions/cache with a cache key as follows: That creates a new cache for every distinct Git commit at the head of the topic branch. For my "restore keys", I used a cascading sequence back down to the cache that might be saved against the repository's base branch ("main"):
That is, if the head commit on the topic branch changed, look for the most recent cache from this same branch. Failing that, look for the most recent cache from the base branch. Now, I had a separate GitHub Actions workflow that ran on pushes to the "main" branch, meaning whenever we'd merge a PR against it. That workflow also used actions/cache to build all the same artifacts as those used in the aforementioned workflow for PRs. Its cache key was Between the PR-focused and this push-to-"main" workflow, these caches worked nicely to provide mostly fresh Bazel action output and built artifacts to subsequent workflow runs. They came with a few liabilities, though:
|
Thank you for explaining your setup. I believe these all are workaround to a fundamental limitation of current cache implementation - it's coarse-grained. The best solution to this problem I can think of is to implement a small HTTP server that is started by the action, which supports being used as Bazel remote caching (https://bazel.build/remote/caching#http-caching) and it would internally be backed by GitHub Actions cache. This would allow storing the fine-grained caches in GHA and also avoid over-downloading. A similar approach is taken by buildkit (https://github.com/moby/buildkit/blob/master/cache/remotecache/gha/gha.go) in which they translate Docker build cache into GHA cache. Unfortunately, I don't have time to work on this at the moment, but if anyone is up to implementing this, I would be happy to expand on how it might work. For now, we can work around the fundamental limitation by saving cache only on a |
I had some time to prototype an idea of running an HTTP server as part of an action is compatible with Bazel remote caching. The server does not store anything, but simply translates Bazel remote caching REST API calls to In simple examples it worked great and created hundreds of cache entries that would match Bazel outputs. Upon re-running the build, I could see that remote cache is being used for build/test. However, when I started testing more complex scenarios with thousands of cache entries, I obviously stumbled upon rate limiting of GitHub Actions cache API and what seemed to be unhandled rate limits directly from Microsoft Azure Storage. I could still see cache created and available eventually, but the builds were failing with "Missing digest for cache" errors. Unfortunately, I don't have time to dig more into this and build something robust and available for general use. I also had issues with remote cache on Windows, which I am not sure about - whether the errors are in Bazel itself or in my implementation. The proof-of-concept can be seen in #21. If anyone wants to pick it up and continue working, I'll be happy to collaborate. |
Would a PR to implement a
|
@bentekkie Yes, but let's ensure the API is correct. We need a way to disable uploading caches from PRs, isn't that what we want? - uses: bazel-contrib/[email protected]
with:
disk-cache: true
disk-cache-save: ${{ github.ref == 'refs/heads/main' }} |
Shouldn't that be a decision left to users? With an option like this a user can use a condition like in the example here to only run for main, but they could also change that condition if they want to restrict based on other variables |
@bentekkie Yes, I'm just saying that it should be called |
Ah missed that, sounds good, I will try and send a pr for this |
Not sure if related, but I have a similar question: I am using
Thanks for the great package! |
I think that rules_python keeps pypi dependencies in external repositories which means they are not part of disk cache. You can try enabling both repository cache and see if it helps for your case. If not, consider enabling external caches as well. |
Hey folks, do you have any prototypes for this functionality? I've recently ported azure pipelines to github actions an am running into cache invalidation again so I'm hoping a "yes, definitely save" lever can help.... |
What exact functionality are you referring to? This issue mentioned several different problems and approaches to solving those. |
|
No, unfortunately, there were no prototypes for this. If you are up to try making a PR for this, I'd be happy to merge it. |
No worries, thanks for confirming. Absolutely, if I am able to nail my problem down and requires adjustment within your plugin, I'll ping this thread with a PR. |
There is an open issue on bazel-contrib/setup-bazel#18 to track the failure to save updated caches. Signed-off-by: Chaitanya Munukutla <[email protected]>
For several years I've seen advice for caching Bazel's built artifacts in GitHub Actions by using actions/cache and including ~/.cache/bazel as one of the captured directory paths. When doing so, with the right cache key and set of hierarchical "restore keys", we can coax Bazel into reusing a lot of what it's already built or tested and avoid it needing to run actions in subsequent workflow runs when the action inputs haven't changed.
Using this setup-bazel action, I see that we can enable a repository cache and a disk cache, and the action takes care of setting Bazel's output base directory, but it doesn't appear to save the built artifacts to a cache. My reading of the disk cache documentation suggests that it includes "build artifacts", but even when I enable setup-bazel's use of the disk cache and I see my GitHub Actions workflow run restore the disk cache successfully, it still appears that Bazel winds up running many actions for which I expected to find the outputs already available in the cache.
Do I need to use actions/cache separately to cache more of these action outputs, or should the disk cache configured by setup-bazel already take care of that?
The text was updated successfully, but these errors were encountered: