You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current support for Rust crate rebuilds is a bit hand-wavy when it comes to lockfiles: We make the Cargo.lock content an input to our build (at substantial cost to build definition size) so as to avoid trying to reproduce it.
However discussing this issue with @kpcyrd, we may have stumbled on to a way to potentially make lockfile generation stable:
The cargo registry is effectively just a git repo with a newer index API that provides a caching layer.
Before the introduction of the index: The registry repo was present on disk and infrequently updated.
After the index: The registry state is accessed via the index API and is close to the repo head.
At lockfile construction time, the registry's git repo state is accessed (directly or via the index API) to resolve a crate's dependency graph and store it in Cargo.lock.
So what we should be able to do is look for a point in the git repo's history where this resolution should have been possible.
For our purposes, we can somewhat arbitrarily pick the earliest commit in the registry git repo which contains all relevant dependency versions.
As for technical feasibility, the repo is currently 160MB bare and ~2GB checked out and has a linear history of ~15k commits. These aren't particularly big numbers aside from the checkout size but, luckily, we should be able to avoid materializing the full repo while still quickly searching the history.
An unoptimized prototype shows us able to search ~20 commits per second which, at the current rate of ~1000 commits/day, would put us at ~40s for a day of registry state. We can almost certainly improve this but, even now, it should be viable to introduce this into our inference pipeline, especially considering we expect the crate publish time to be quite close (well within 1d) to the registry state time in the common case.
The text was updated successfully, but these errors were encountered:
Quick update: #140 improved the search rate such that we should be able to find the target registry commit in about a tenth the time initially quoted (~7s).
Our current support for Rust crate rebuilds is a bit hand-wavy when it comes to lockfiles: We make the
Cargo.lock
content an input to our build (at substantial cost to build definition size) so as to avoid trying to reproduce it.However discussing this issue with @kpcyrd, we may have stumbled on to a way to potentially make lockfile generation stable:
The cargo registry is effectively just a git repo with a newer index API that provides a caching layer.
At lockfile construction time, the registry's git repo state is accessed (directly or via the index API) to resolve a crate's dependency graph and store it in
Cargo.lock
.So what we should be able to do is look for a point in the git repo's history where this resolution should have been possible.
For our purposes, we can somewhat arbitrarily pick the earliest commit in the registry git repo which contains all relevant dependency versions.
As for technical feasibility, the repo is currently 160MB bare and ~2GB checked out and has a linear history of ~15k commits. These aren't particularly big numbers aside from the checkout size but, luckily, we should be able to avoid materializing the full repo while still quickly searching the history.
An unoptimized prototype shows us able to search ~20 commits per second which, at the current rate of ~1000 commits/day, would put us at ~40s for a day of registry state. We can almost certainly improve this but, even now, it should be viable to introduce this into our inference pipeline, especially considering we expect the crate publish time to be quite close (well within 1d) to the registry state time in the common case.
The text was updated successfully, but these errors were encountered: