Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for Cargo.lock file rebuilds #110

Open
msuozzo opened this issue Sep 26, 2024 · 1 comment
Open

Improve support for Cargo.lock file rebuilds #110

msuozzo opened this issue Sep 26, 2024 · 1 comment

Comments

@msuozzo
Copy link
Member

msuozzo commented Sep 26, 2024

Our current support for Rust crate rebuilds is a bit hand-wavy when it comes to lockfiles: We make the Cargo.lock content an input to our build (at substantial cost to build definition size) so as to avoid trying to reproduce it.

However discussing this issue with @kpcyrd, we may have stumbled on to a way to potentially make lockfile generation stable:


The cargo registry is effectively just a git repo with a newer index API that provides a caching layer.

  • Before the introduction of the index: The registry repo was present on disk and infrequently updated.
  • After the index: The registry state is accessed via the index API and is close to the repo head.

At lockfile construction time, the registry's git repo state is accessed (directly or via the index API) to resolve a crate's dependency graph and store it in Cargo.lock.

So what we should be able to do is look for a point in the git repo's history where this resolution should have been possible.

For our purposes, we can somewhat arbitrarily pick the earliest commit in the registry git repo which contains all relevant dependency versions.


As for technical feasibility, the repo is currently 160MB bare and ~2GB checked out and has a linear history of ~15k commits. These aren't particularly big numbers aside from the checkout size but, luckily, we should be able to avoid materializing the full repo while still quickly searching the history.

An unoptimized prototype shows us able to search ~20 commits per second which, at the current rate of ~1000 commits/day, would put us at ~40s for a day of registry state. We can almost certainly improve this but, even now, it should be viable to introduce this into our inference pipeline, especially considering we expect the crate publish time to be quite close (well within 1d) to the registry state time in the common case.

@msuozzo
Copy link
Member Author

msuozzo commented Nov 20, 2024

Quick update: #140 improved the search rate such that we should be able to find the target registry commit in about a tenth the time initially quoted (~7s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant