[December] The cargo
integration has begun!
#694
Byron
announced in
Progress Update
Replies: 1 comment 4 replies
-
What the update for gitoxide integration with cargo? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This month is all about getting started with integrating
gitoxide
intocargo
, and as you can imagine there is a lot of small (and bigger) things that are needed ingitoxide
to make this happen without anybody noticing any difference except that everything is faster and better.Here come the details.
The Cargo Integration has begun
On the first of December the official integration PR was created with the goal of doing all fetches using
gitoxide
. For that to work, plenty of improvements were required.Prodash with IDs
gitoxide
usesprodash
for hierarchical progress reporting. Eachprogress
can have child-progress to correctly visualize complex tasks with a lot going on concurrently or in lock-step, as a tree of progress.cargo
needs to show progress using its own progress bars and providing only select output, like how many compressed bytes are received and how many objects are included.To enable this,
prodash
had to learn the concept of an ID which is a stable way to identify progress information within the progress tree, andgitoxide
now provides IDs for all progress it creates.cargo
also had to changes little as it was built around getting callbacks for progress, and now it's observing progress from a separate thread. This has the advantage that progress reporting can be as fast as one can increment an atomic counter, while progress reporting can happen at its own pace.Various fixes
With
cargo
having it's own and quite exhaustive test suite it showed plenty of shortcomings in the waygitoxide
was implemented at the time. All it took were about 195 git-specific tests, with a few of them running into errors that needed fixing ongitoxide
's side.Reuse connections of HTTP transport
A test against a stubbed HTTP server showed that
gitoxide
would not keep the connection alive despite this being the default for HTTP 1.1, causing the test to fail.The fix was quite simple and just an oversight in the implementation, so I am really happy there was a test catching it. And of course, a similar test was added to the
gitoxide
test suite to prevent regression.No UTF8 for ls-refs line parsing
When scouring through the
git-protocol
code it became evident that a token of tech-debt was still left: references would be forced into UTF-8 Strings before parsing them (as bytes) due to using the standardlines()
iterator of a buffered reader. This is a problem as technically, UTF-8 isn't a requirement for git references at all so the conversion can fail. Further, the bytes to string conversion is another copy and allocation that is unnecessary.With the introduction of a new trait it was possible to expose line-by-line reading that provides buffer-backed bytes instead to fix all prior problems in one go.
git-url
improved handling of windows pathsgit-url
as a crate which makes the various kinds of allowed URLs accessible was quite stable recently, butcargo
did manage to throw some new URLs at it that caused it to choke. Fixes were trivial, fortunately, but it remains to be seen what other URLs are able to go beyond any expectation.Improved discovery of non-typical repositories
Cargo does it's best to be resilient against breaking git repositories and should never choke on corrupt or unexpected data coming from disk. And of course, there is a test for that which caused the repository-local configuration file to go missing.
git2
was fine with that,git
was fine with that, butgitoxide
really thought aconfig
file has to be there.Now
gitoxide
will assume default values which seems to be equivalent to whatgit
does as well.auto-tags
When implementing
fetch
I intentionally skipped the 'auto-tag' feature to save some time in that moment, thinking that something this 'obscure' probably isn't required from day one. However, it turns out that 'auto-tags' aren't obscure at all and implementing them is needed for correct fetches and clones.auto-tags
as a capability need to be enabled by the client so the server can automatically send along all tag objects that point to any commit it would normally be sending. That way, the client will have all objects necessary to automatically create tags that point to any object it received. This is a somewhat hidden feature when doinggit clone
as git won't inform about the tags it createdThe
cargo
test, however, was performing a fetch with ref-specs that only refer to branches, but assumed that tag references would also be present automatically.Now
gitoxide
behaves like it should and properly keeps track of implicit ref-specs like these, respecting typicalgit
defaults out of the box.Another one of these eurekas: environment-to-config-mapping
It still amazes me that this far into the project, it's still possible to make changes that make it seem that a big piece of the puzzle finally fell into place. Previously, environment variables, despite already being gated behind a permission flag, could possibly be read lazily in various places of the
git-repository
case. For a library, that always troubled me as this effectively adds global state to the equation, littered all of the codebase. It never felt quite right.However, since
git-config
rising to incredible importance withingitoxide
there also was an advent ofgitoxide.
prefixed configuration variables for configuration ofgitoxide
specific values. Value being there are in the right place, the in that moment I connected the dots.Why not have a single step which transforms environment variables, based on permissions, into respective configuration values? That way there is one place to rule all the access to environment variables, and it's done exactly once when the
Repository
is opened.Now all code needing these values could just read it from the git configuration, which also makes these values accessible to API users who want to override them, now without having to set environment variables in their process at all.
gix
improvementsDue to the changes to integrate with
gitoxide
, a few improvements also landed ingix
to make them available to a broader audience and to additional testing on actual repositories.odb stats
Having said the above, that everything was about
cargo
, it's probably fair to start out with an exception. With the publication ofnoseyparker
, I couldn't resist to throwgitoxide
at it to see how much faster it can go. It turned out, quite a lot!However, there was one feature missing to completely remove
git2
from there, which was related to quickly enumerating all objects to know the bounds of the pending scan.For that,
repo.objects.header()
was added to only decode object headers, as fast as possible, without decoding the whole objects. In average, doing that as opposed to a full decode is about 8 times faster, which is also faster than whatgit2
can offer.With it, and full parallelism, an M1 Pro with 8.5 cores can calculate statistics on all objects within the linux kernel repo in 1.4s, or 6.7 million objects per second. This is too fast for any delta-caches by the way, and no matter what, my cache implementations that work fine when decoding objects wouldn't be able to speed up this computation at all (that is, the cost of the cache outweighed its benefits in all tested workloads) - interesting.
With it, there was also an option to change the iteration of all objects within an object database to be in pack order, which allows to use pack-delta caches much better when the entire object is decoded later.
All in all, I am happy
noseyparker
motivated me to finally implement the last missing feature in the object database.--strict
and--no-verbose
For completeness and control, the
--strict
flag was added to ensure that commands that are 'lenient' on the git configuration by default are forced to be strict. Being strict is the same way thatgit
treats configuration as it will fail to perform any action if it contains malformed or invalid values.gitoxide
implements a degree of leniency to allow commands likegix config
to always display configuration despite the presence of malformed values.With
gix fetch
andgix clone
being actually useful, it was important to have a way to automatically be verbose to show progress information. To provide more control over it, there is now a--no-verbose
flag to turn off progress reporting in case of sub-commands that turn it on by default.clone
with auto-tagsFinally,
gix clone
performs clones that have a similar result as ifgit clone
would have been used, due to the implementation ofauto-tags
. To counter that and allow no tags to be cloned to keep the repo more minimal, one can now specify--no-tags
as well.Community
Worktree configuration support
Sidney contributed the ability to layer worktree-specific configuration on top of the one coming from the shared repository, providing a path to supporting sparse checkouts correctly. Thanks to his work,
gitoxide
will be supporting sparse checkouts by default as it's aware of all features of theindex
file from the very start.A fix for relative SCP URLs
Thanks to a user who employs
gitoxide
to clone and fetch repositories a fix was provided to deal with SCP-like git urls with relative paths correctly.Various impactful fixes to
git-date
crateThanks to the repeated engagement of one industrious individual
git-date
received a lot of impactful bugfixes. First to mention is a fix for incorrect handling of timezone offsets which made roundtrips impossible. Baseline tests now compare to the results in git more thoroughly as well which should prevent regressions and help future development.Lastly, due to a method being entirely untested and due to being dependent on the system's local timezone it was never run in a place where the timezone offset would become negative. And exactly that wasn't possible, until now.
Thanks for all your work!
Rust Foundation sponsorship: cargo shallow clones update
The last month on the grant to integrate
gitoxide
into cargo for shallow clones has begun and my plan is to make the PR reviewable this year 🎉. Doing mere fetches withgitoxide
already provides substantial performance improvements particularly on Windows, so it's worth having even before shallow clone support lands.It will definitely take more time to finish the integration and set
cargo
up to phase outgit2
as well.Merry Christmas, and a happy new year,
Sebastian
PS: The latest timesheets can be found here.
Beta Was this translation helpful? Give feedback.
All reactions