-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO-NOT-MERGE-YET] Rebase to v2.48.0 #714
base: vfs-2.48.0
Are you sure you want to change the base?
Conversation
We will need to ensure that |
83d38ac
to
a38426e
Compare
Range-diff relative to initially-pushed branch
|
6ac52d2
to
8ad18d5
Compare
Finally all the test failures are fixed. Range-diff relative the previous range-diff'ed revision
|
Backported #707, #712 and #713. git range-diff --remerge-diff 2dc56ed..1d7817c 8ad18d5..df8e7d8(The
|
df8e7d8
to
136e98a
Compare
Just to clean up a little, I squashed a few patches (the force-push is tree-same): Range-diff since previous force-push
Quick call-out: What remains of this patch will be superseded by git-for-windows#5322 once I rebase onto -rc1. |
In ac8acb4 (sparse-index: complete partial expansion, 2022-05-23), 'expand_index()' was updated to expand the index to a given pathspec. However, the 'path_matches_pattern_list()' method used to facilitate this has the side effect of initializing or updating the index hash variables ('name_hash', 'dir_hash', and 'name_hash_initialized'). This operation is performed on 'istate', though, not 'full'; as a result, the initialized hashes are later overwritten when copied from 'full'. To ensure the correct hashes are in 'istate' after the index expansion, change the arg used in 'path_matches_pattern_list()' from 'istate' to 'full'. Note that this does not fully solve the problem. If 'istate' does not have an initialized 'name_hash' when its contents are copied to 'full', initialized hashes will be copied back into 'istate' but 'name_hash_initialized' will be 0. Therefore, we also need to copy 'full->name_hash_initialized' back to 'istate' after the index expansion is complete. Signed-off-by: Victoria Dye <[email protected]>
These seem to be custom tests to microsoft/git as they break without these changes, but these changes are not needed upstream. Signed-off-by: Derrick Stolee <[email protected]>
Add a test verifying that sparse-checkout (with and without sparse index enabled) treat untracked files & directories correctly when changing sparse patterns. Specifically, it ensures that 'git sparse-checkout set' * deletes empty directories outside the sparse cone * does _not_ delete untracked files outside the sparse cone Signed-off-by: Victoria Dye <[email protected]>
Add test case to demonstrate that `git index-pack -o <idx-path> pack-path` fails if <idx-path> does not end in ".idx" when `--rev-index` is enabled. In e37d0b8 (builtin/index-pack.c: write reverse indexes, 2021-01-25) we learned to create `.rev` reverse indexes in addition to `.idx` index files. The `.rev` file pathname is constructed by replacing the suffix on the `.idx` file. The code assumes a hard-coded "idx" suffix. In a8dd7e0 (config: enable `pack.writeReverseIndex` by default, 2023-04-12) reverse indexes were enabled by default. If the `-o <idx-path>` argument is used, the index file may have a different suffix. This causes an error when it tries to create the reverse index pathname. The test here demonstrates the failure. (The test forces `--rev-index` to avoid interaction with `GIT_TEST_NO_WRITE_REV_INDEX` during CI runs.) Signed-off-by: Jeff Hostetler <[email protected]>
Teach index-pack to silently omit the reverse index if the index file does not have the standard ".idx" suffix. In e37d0b8 (builtin/index-pack.c: write reverse indexes, 2021-01-25) we learned to create `.rev` reverse indexes in addition to `.idx` index files. The `.rev` file pathname is constructed by replacing the suffix on the `.idx` file. The code assumes a hard-coded "idx" suffix. In a8dd7e0 (config: enable `pack.writeReverseIndex` by default, 2023-04-12) reverse indexes were enabled by default. If the `-o <idx-path>` argument is used, the index file may have a different suffix. This causes an error when it tries to create the reverse index pathname. Since we do not know why the user requested a non-standard suffix for the index, we cannot guess what the proper corresponding suffix should be for the reverse index. So we disable it. The t5300 test has been updated to verify that we no longer error out and that the .rev file is not created. TODO We could warn the user that we skipped it (perhaps only if they TODO explicitly requested `--rev-index` on the command line). TODO TODO Ideally, we should add an `--rev-index-path=<path>` argument TODO or change `--rev-index` to take a pathname. TODO TODO I'll leave these questions for a future series. Signed-off-by: Jeff Hostetler <[email protected]>
Prefetch the value of GIT_TRACE2_DST_DEBUG during startup and before we try to open any Trace2 destination pathnames. Normally, Trace2 always silently fails if a destination target cannot be opened so that it doesn't affect the execution of a Git command. The command should run normally, but just not generate any trace data. This can make it difficult to debug a telemetry setup, since the user doesn't know why telemetry isn't being generated. If the environment variable GIT_TRACE2_DST_DEBUG is true, the Trace2 startup will print a warning message with the `errno` to make debugging easier. However, on Windows, looking up the env variable resets `errno` so the warning message always ends with `...tracing: No error` which is not very helpful. Prefetch the env variable at startup. This avoids the need to update each call-site to capture `errno` in the usual `saved-errno` variable. Signed-off-by: Jeff Hostetler <[email protected]>
Calculate the number of symrefs, loose vs packed, and the maximal/accumulated length of local vs remote branches. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
This backports the `ds/advice-sparse-index-expansion` patches into `microsoft/git` which _just_ missed the v2.46.0 window. Signed-off-by: Johannes Schindelin <[email protected]>
Cherry-pick rev-index fixes from v2.41.0.vfs.0.5 into v2.42.0.*
Prefetch the value of GIT_TRACE2_DST_DEBUG during startup and before we try to open any Trace2 destination pathnames. Normally, Trace2 always silently fails if a destination target cannot be opened so that it doesn't affect the execution of a Git command. The command should run normally, but just not generate any trace data. This can make it difficult to debug a telemetry setup, since the user doesn't know why telemetry isn't being generated. If the environment variable GIT_TRACE2_DST_DEBUG is true, the Trace2 startup will print a warning message with the `errno` to make debugging easier. However, on Windows, looking up the env variable resets `errno` so the warning message always ends with `...tracing: No error` which is not very helpful. Prefetch the env variable at startup. This avoids the need to update each call-site to capture `errno` in the usual `saved-errno` variable.
With this commit, we gather statistics about the sizes of commits, trees, and blobs in the repository, and then present them in the form of "hexbins", i.e. log(16) histograms that show how many objects fall into the 0..15 bytes range, the 16..255 range, the 256..4095 range, etc. For commits, we also show the total count grouped by the number of parents, and for trees we additionally show the total count grouped by number of entries in the form of "qbins", i.e. log(4) histograms. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
Create `struct large_item` and `struct large_item_vec` to capture the n largest commits, trees, and blobs under various scaling dimensions, such as size in bytes, number of commit parents, or number of entries in a tree. Each of these have a command line option to set them independently. Signed-off-by: Jeff Hostetler <[email protected]>
Include the pathname of each blob or tree in the large_item_vec to help identify the file or directory associated with the OID and size information. This pathname is computed during the path walk, so it reflects the first observed pathname seen for that OID during the traversal over all of the refs. Since the file or directory could have moved (without being modified), there may be multiple "correct" pathnames for a particular OID. Since we do not control the ref traversal order, we should consider it to be a "suggested pathname" for the OID. Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
Computing `git name-rev` on each commit, tree, and blob in each of the various large_item_vec can be very expensive if there are too many refs, especially if the user doesn't need the result. Lets make it optional. The `--no-name-rev` option can save 50 calls to `git name-rev` since we have 5 large_item_vec's and each defaults to 10 items. Signed-off-by: Jeff Hostetler <[email protected]>
This adds a new builtin, `git update-microsoft-git`, that executes the platform-specific upgrade steps to get the latest version of `microsoft-git`. On Windows, this means running `git update-git-for-windows` which was updated to use the `microsoft/git` releases page, when appropriate. See #321 for details. On macOS, this means running a sequence of `brew` commands. These are adapted from the `UpgradeVerb` in `microsoft/scalar`, with an important simplification: we don't need to differentiate between the `scalar` and `scalar-azrepos` cask.
Signed-off-by: Derrick Stolee <[email protected]>
…x-built-in-fsmonitor Fix the built-in FSMonitor, and run Scalar's Functional Tests as part of the automated builds
It is sometimes difficult to support users who are hitting issues with sparse index expansion because it is unclear why the index needs to expand from logs alone. It is too invasive to set up a debugging scenario on the user's machine, so let's improve the logging. Create a new ensure_full_index_with_reason() method that takes a formatting string and parameters. If the index is not fully expanded, then apply the formatting logic to create the logged string and log it before calling ensure_full_index(). This should assist with discovering why an index is expanded from trace2 logs alone. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
This is random stuff that probably all got upstream in the meantime.
These locations that previously called ensure_full_index() are now updated to call the ..._with_reason() varation using fixed strings that should be enough to identify the reason for the expansion. This will help users use tracing to determine why the index is expanding in their scenarios. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
These cases that call ensure_full_index() are likely to be due to a data shape issue on a user's machine, so take the extra time to format a message that can be placed in their trace2 output and hopefully identify the problem that is leading to this slow behavior. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
For safety, areas of code that iterate over the cache entries in the index were guarded with ensure_full_index() and labeled with a comment. Replace these with a macro that calls ensure_full_index_with_reason() using the line number of the caller to help identify the situation that is causing the index expansion. Signed-off-by: Derrick Stolee <[email protected]>
The recent changes update the callers of ensure_full_index() to call variants that will log extra information. This should assist developers assisting users who are hitting the sparse index expansion message. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
The clear_skip_worktree_from_present_files_sparse() method attempts to clear the skip worktree bit from cache entries in the index depending on when they exist in the workdir. When this comes across a sparse directory that actually exists in the workdir, then this method fails and signals that the index needs expansion. The index expansion already logs a reason, but this reason is separate from the path that caused this failure. Add logging to demonstrate this situation for full clarity. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
I will intend to send this upstream after the 2.47.0 release cycle, but this should get to our microsoft/git users for maximum impact. Customers have been struggling with explaining why the sparse index expansion advice message is showing up. The advice to run 'git clean' has not always helped folks, and sometimes it is very unclear why we are running into trouble. These changes introduce a way to log a reason for the expansion into the trace2 logs so it can be found by requesting that a user enable tracing. While testing this, I created the most standard case that happens, which is to have an existing directory match a sparse directory in the index. In this case, it showed that two log messages were required. See the last commit for this new log message. Together, these two places show this kind of message in the `GIT_TRACE2_PERF` output (trimmed for clarity): ``` region_enter | index | label:clear_skip_worktree_from_present_files_sparse data | sparse-index | ..skip-worktree sparsedir:<my-sparse-path>/ data | index | ..sparse_path_count:362 data | index | ..sparse_lstat_count:732 region_leave | index | label:clear_skip_worktree_from_present_files_sparse data | sparse-index | expansion-reason:failed to clear skip-worktree while sparse ``` I added some tests to demonstrate that these logs are recorded, but it also seems difficult to hit some of these cases.
These two tests in t5616-partial-clone.sh are actually already broken and there are comments supporting that. Those comments were focused on the GIT_TEST_FULL_NAME_HASH variable, but they also apply to this one. We will want to avoid issues here. Signed-off-by: Derrick Stolee <[email protected]>
In preparation for allowing both the --shallow and --path-walk options in the 'git pack-objects' builtin, create a new 'edge_aggressive' option in the path-walk API. This option will help walk the boundary more thoroughly and help avoid sending extra objects during fetches and pushes. The only use of the 'edge_hint_aggressive' option in the revision API is within mark_edges_uninteresting(), which is usually called before between prepare_revision_walk() and before visiting commits with get_revision(). In prepare_revision_walk(), the UNINTERESTING commits are walked until a boundary is found. We didn't use this in the past because we would mark objects UNINTERESTING after doing the initial commit walk to the boundary. While we should be marking these objects as UNINTERESTING, we shouldn't _emit_ them all via the path-walk algorithm or else our delta calculations will get really slow. Based on these observations, the way we were handling the UNINTERESTING flag in walk_objects_by_path() was overly complicated and buggy. A lot of it can be removed and simplified to work with this new approach. It also means that we will see the UNINTERESTING boundaries of paths when doing a default path-walk call, changing some existing test cases. Signed-off-by: Derrick Stolee <[email protected]>
In some instances (particularly the `read_object` hook), the `cmd` attribute is set to an `strdup()`ed value. This value needs to be released in the end! Since other users assign a non-`strdup()`ed value, be careful to add _another_ attribute (called `to_free`) that can hold a reference to such a string that needs to be released once the sub process is done. Signed-off-by: Johannes Schindelin <[email protected]>
There does not appear to be anything particularly incompatible about the --shallow and --path-walk options of 'git pack-objects'. If shallow commits are to be handled differently, then it is by the revision walk that defines the commit set and which are interesting or uninteresting. However, before the previous change, a trivial removal of the warning would cause a failure in t5500-fetch-pack.sh when GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more objects than we desired, due to some incorrect behavior of the path-walk API, especially around walking uninteresting objects. To also cover the symmetrical case of pushing from a shallow clone, add a new test to t5538-push-shallow.sh that confirms the correct behavior of pushing only the new object. This works to validate both the --path-walk and --no-path-walk case when toggling the GIT_TEST_PACK_PATH_WALK environment variable. This test would have failed in the --path-walk case if we created it before the previous change. Signed-off-by: Derrick Stolee <[email protected]>
This fixes a leak that is not detected by Git's test suite (but by microsoft/git's). Signed-off-by: Johannes Schindelin <[email protected]>
It can be notoriously difficult to detect if delta bases are being computed properly during 'git push'. Construct an example where it will make a kilobyte worth of difference when a delta base is not found. We can then use the progress indicators to distinguish between bytes and KiB depending on whether the delta base is found and used. Signed-off-by: Derrick Stolee <[email protected]>
An internal customer reported a segfault when running `git sparse-checkout set` with the `index.sparse` config enabled. I was unable to reproduce it locally, but with their help we debugged into the failing process and discovered the following stacktrace: ``` #0 0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125 #1 0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247 #2 0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122 #3 0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638 #4 0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255 #5 0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570) at sparse-index.c:307 #6 0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46 #7 0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80 #8 0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80 #9 0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422 #10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456 #11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0, search_mode=EXPAND_SPARSE) at read-cache.c:556 #12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566 #13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756 #14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860 #15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063 #16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548 #17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808 #18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877 #19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017 #20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 ``` The very bottom of the stack being the `rehash()` method from `hashmap.c` as called within the `name-hash` API made me look at where these hashmaps were being used in the sparse index logic. These were being copied across indexes, which seems dangerous. Indeed, clearing these hashmaps and setting them as not initialized fixes the segfault. The second commit is a response to a test failure that happens in `t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to fail because the underlying `git checkout-index` process fails due to colliding files. Passing the `-f` flag appears to work, but it's unclear why this name-hash change causes that change in behavior.
This fixes a leak that is not detected by Git's own test suite (but by microsoft/git's, in the t9210-scalar.sh test). Signed-off-by: Johannes Schindelin <[email protected]>
This pull request aims to correct a pretty big issue when dealing with UNINTERESTING objects in the path-walk API. They somehow were only exposed when trying to perform a push from a shallow clone. This will require rewriting the upstream version so this is avoided from the start, but we can do a forward fix for now. The key issue is that the path-walk API was not walking UNINTERESTING trees at the right time, and the way it was being done was more complicated than it needed to be. This changes some of the way the path-walk API works in the presence of UNINTERSTING commits, but these are good changes to make. I had briefly attempted to remove the use of the `edge_aggressive` option in `struct path_walk_info` in favor of using the `--objects-edge-aggressive` option in the revision struct. When I started down that road, though, I somehow got myself into a bind of things not working correctly. I backed out to this version that is working with our test cases. I tested this using the thin and big pack tests in `p5313` which had the same performance as before this change. The new change is that in a shallow clone we can get the same `git push` improvements. I was hung up on testing this for a long time as I wasn't getting the same results in my shallow clone as in my regular clones. It turns out that I had forgotten to use `--no-reuse-delta` in my test command, so it was picking the deltas that were given by the initial clone instead of picking new ones per the algorithm. 🤦🏻
Git v2.48.0 has become even more stringent about leaks. Signed-off-by: Johannes Schindelin <[email protected]>
The check for dubious ownership has one particular quirk on Windows: if running as an administrator, files owned by the Administrators _group_ are considered owned by the user. The rationale for that is: When running in elevated mode, Git creates files that aren't owned by the individual user but by the Administrators group. There is yet another quirk, though: The check I introduced to determine whether the current user is an administrator uses the `CheckTokenMembership()` function with the current process token. And that check only succeeds when running in elevated mode! Let's be a bit more lenient here and look harder whether the current user is an administrator. We do this by looking for a so-called "linked token". That token exists when administrators run in non-elevated mode, and can be used to create a new process in elevated mode. And feeding _that_ token to the `CheckTokenMembership()` function succeeds! Signed-off-by: Johannes Schindelin <[email protected]>
This adds a new sub-sub-command for `test-tool`, simply passing through the command-line arguments to the `is_path_owned_by_current_user()` function. Signed-off-by: Johannes Schindelin <[email protected]>
The --path-walk option in `git pack-objects` is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within `git push` specifically. While this config does enable the path-walk feature, it does not lead to the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via `git repack -Ad` helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set `gc.auto=0` before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in `git push`, this change enforces the spawned `git pack-objects` process to use `--no-reuse-delta`. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). This commit also adds a test case that demonstrates that `git -c pack.usePathWalk=true push` now avoids reusing deltas. To do this, the test case constructs a pack with a horrendously inefficient delta object, then verifies that the pack on the receiving side of the `push` fails to have such an inefficient delta. The test case would probably be a lot more readable if hex numbers were used instead of octal numbers, but alas, `printf "\x<hex>"` is not portable, only `printf "\<octal>"` is. For example, dash's built-in `printf` function simply prints `\x` verbatim while bash's built-in happily converts this construct to the corresponding byte. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
The check for dubious ownership has one particular quirk on Windows: if running as an administrator, files owned by the Administrators _group_ are considered owned by the user. The rationale for that is: When running in elevated mode, Git creates files that aren't owned by the individual user but by the Administrators group. There is yet another quirk, though: The check I introduced to determine whether the current user is an administrator uses the `CheckTokenMembership()` function with the current process token. And that check only succeeds when running in elevated mode! Let's be a bit more lenient here and look harder whether the current user is an administrator. We do this by looking for a so-called "linked token". That token exists when administrators run in non-elevated mode, and can be used to create a new process in elevated mode. And feeding _that_ token to the `CheckTokenMembership()` function succeeds!
The --path-walk option in 'git pack-objects' is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within 'git push' specifically. While this config does enable the path-walk feature, it does not lead the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via 'git repack -Ad' helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set 'gc.auto=0' before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in 'git push', this change makes the --path-walk option imply --no-reuse-delta when the --reuse-delta option is not provided. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). The --path-walk option at the moment is not compatible with reachability bitmaps, so is not planned to be used by Git servers. Thus, we can reasonably assume (for now) that the --path-walk option is assuming a client-side scenario, either a push or a repack. The repack option will be explicit about the --reuse-delta option or not. One thing to be careful about is background maintenance, which uses a list of objects instead of refs, so we condition this on the case where the --path-walk option will be effective by checking that the --revs option was provided. Alternative options considered included: * Adding _another_ config ('pack.reuseDelta=false') to opt-in to this choice. However, we already have pack.usePathWalk=true as an opt-in to "do the right thing to make my data small" as far as our internal users are concerned. * Modify the chain between builtin/push.c, transport.c, and builtin/send-pack.c to communicate that we are in "push" mode, not within a fetch or clone. However, this seemed like overkill. It may be beneficial in the future to pass through a mode like this, but it does not meet the bar for the immediate need. Reviewers, please see git-for-windows#5171 for the baseline implementation of this feature within Git for Windows and thus microsoft/git. This feature is still under review upstream.
52b5991
to
27273ae
Compare
And just like was the case with Git for Windows, there's some fall-out fro Range-diff relative to previously-range-diff'ed PR branch
|
Range-diff relative to vfs-2.47.1
1: f8a9b7d = 1: e1ee1fb sparse-index.c: fix use of index hashes in expand_index
2: 3f29558 = 2: e4bf5ba t5300: confirm failure of git index-pack when non-idx suffix requested
3: 39c301e = 3: a487ef4 t: remove advice from some tests
4: c764d29 = 4: 2273c73 t1092: add test for untracked files and directories
7: 880848c = 5: 610b51f index-pack: disable rev-index if index file has non .idx suffix
5: 45fa069 = 6: 41e508a survey: calculate more stats on refs
6: caecb6c = 7: 7c49aec survey: show some commits/trees/blobs histograms
9: afe1b3d = 8: de05533 survey: add vector of largest objects for various scaling dimensions
10: 6024611 = 9: b6e0cb9 survey: add pathname of blob or tree to large_item_vec
11: e0f3278 = 10: 097dfe9 survey: add commit-oid to large_item detail
12: 92dc76f = 11: 4c0c25f survey: add commit name-rev lookup to each large_item
13: b4ef786 = 12: aba19af survey: add --no-name-rev option
14: ff5800d = 13: 09a23bd survey: started TODO list at bottom of source file
15: 53525fc = 14: 0f15b3f survey: expanded TODO list at the bottom of the source file
16: 3c34aa0 = 15: d0b2832 survey: expanded TODO with more notes
8: 26be27d = 16: e9faa7c trace2: prefetch value of GIT_TRACE2_DST_DEBUG at startup
17: c146336 = 17: 2415423 reset --stdin: trim carriage return from the paths
18: c9463b5 ! 18: af78073 Identify microsoft/git via a distinct version suffix
19: 759993f < -: ----------- gvfs: ensure that the version is based on a GVFS tag
143: 8d3fe0f = 19: d576c33 git_config_set_multivar_in_file_gently(): add a lock timeout
177: 27c9054 = 20: e4e5842 sequencer: avoid progress when stderr is redirected
-: ----------- > 21: d2f16f8 gvfs: ensure that the version is based on a GVFS tag
20: c9fee9a = 22: 3d9017d gvfs: add a GVFS-specific header file
21: f026a10 = 23: 9cf2623 gvfs: add the core.gvfs config setting
22: 5363b06 ! 24: 63e7518 gvfs: add the feature to skip writing the index' SHA-1
23: 0c59b7c = 25: bfa97bc gvfs: add the feature that blobs may be missing
24: f9c6eae = 26: 01bf3ce gvfs: prevent files to be deleted outside the sparse checkout
25: 43507e2 = 27: 944465b gvfs: optionally skip reachability checks/upload pack during fetch
26: 04e60c0 = 28: bc6ac08 gvfs: ensure all filters and EOL conversions are blocked
27: 69a4394 ! 29: 0295330 gvfs: allow "virtualizing" objects
28: 4e30600 = 30: 68b2b00 Hydrate missing loose objects in check_and_freshen()
29: 28d0788 ! 31: 5d5d088 sha1_file: when writing objects, skip the read_object_hook
30: 027100e ! 32: 71abd25 gvfs: add global command pre and post hook procs
31: 8ebfcd2 ! 33: a573289 t0400: verify that the hook is called correctly from a subdirectory
32: 7f3c491 = 34: 5cffe06 Pass PID of git process to hooks.
33: f108451 ! 35: 85ac67e pre-command: always respect core.hooksPath
34: d5c0cbd = 36: 3459b28 sparse-checkout: update files with a modify/delete conflict
35: d3afb07 = 37: 5f47501 sparse-checkout: avoid writing entries with the skip-worktree bit
36: 9440e53 = 38: f38593a Do not remove files outside the sparse-checkout
37: bcc1fea = 39: 9c4224b send-pack: do not check for sha1 file when GVFS_MISSING_OK set
38: 21e7ac3 = 40: 8a8b58e cache-tree: remove use of strbuf_addf in update_one
39: 0956fc2 ! 41: cb94c4f gvfs: block unsupported commands when running in a GVFS repo
40: c49f24c = 42: 819b108 worktree: allow in Scalar repositories
41: efc8460 = 43: 803d138 gvfs: allow overriding core.gvfs
42: 4d87f3a = 44: 2406a1f BRANCHES.md: Add explanation of branches and using forks
43: 0c2b427 = 45: dd7bbd6 Add virtual file system settings and hook proc
44: 91c84ab = 46: 9e67545 virtualfilesystem: don't run the virtual file system hook if the index has been redirected
45: b2690cc = 47: 68080d1 virtualfilesystem: check if directory is included
46: eed726a = 48: bd4306e backwards-compatibility: support the post-indexchanged hook
47: bf25c76 = 49: b13dd3d gvfs: verify that the built-in FSMonitor is disabled
48: 295c0df = 50: 00a4dfe wt-status: add trace2 data for sparse-checkout percentage
49: 7dd7cca = 51: becdc5d wt-status: add VFS hydration percentage to normal
git status
output50: 15e33c8 = 52: 525ceb8 status: add status serialization mechanism
51: 32985bc = 53: 9aecd4b Teach ahead-behind and serialized status to play nicely together
52: 69a2625 = 54: 7138d62 status: serialize to path
53: 7dc9425 = 55: 85f7657 status: reject deserialize in V2 and conflicts
54: e74c4fa = 56: 90fc66a serialize-status: serialize global and repo-local exclude file metadata
55: cb0713f = 57: 2236e3b status: deserialization wait
56: 7b525d9 = 58: 212a2a6 merge-recursive: avoid confusing logic in was_dirty()
57: 02f7059 = 59: 92b76fd merge-recursive: add some defensive coding to was_dirty()
58: 12aa32d = 60: 450a24d merge-recursive: teach was_dirty() about the virtualfilesystem
59: eb337ae = 61: 5b057fa status: deserialize with -uno does not print correct hint
60: ce7c4ba = 62: c8ef54c fsmonitor: check CE_FSMONITOR_VALID in ce_uptodate
61: ef2b8be = 63: 2d48746 fsmonitor: add script for debugging and update script for tests
62: a9107c8 = 64: 40a3e9e status: disable deserialize when verbose output requested.
63: 8f1a58d = 65: 483d67f t7524: add test for verbose status deserialzation
64: e509c50 = 66: 6c6482f deserialize-status: silently fallback if we cannot read cache file
65: ebe6f19 = 67: fc78e39 gvfs:trace2:data: add trace2 tracing around read_object_process
66: 701b87c = 68: 04cdde9 gvfs:trace2:data: status deserialization information
67: c773538 = 69: 1ff8cb4 gvfs:trace2:data: status serialization
68: ce0dd9a = 70: ee783ff gvfs:trace2:data: add vfs stats
69: 2df249a = 71: 8ef2956 trace2: refactor setting process starting time
70: 3c09592 = 72: b0bb34f trace2:gvfs:experiment: clear_ce_flags_1
71: 3b76c51 = 73: 7c9dd22 trace2:gvfs:experiment: report_tracking
72: cdfaedd = 74: 889b62c trace2:gvfs:experiment: read_cache: annotate thread usage in read-cache
73: 55e10e0 = 75: 91e5b40 trace2:gvfs:experiment: read-cache: time read/write of cache-tree extension
74: 4c4bae9 = 76: 452d900 trace2:gvfs:experiment: add region to apply_virtualfilesystem()
75: a1f3296 = 77: 66b16b3 trace2:gvfs:experiment: add region around unpack_trees()
76: edf9d8a = 78: edd5e72 trace2:gvfs:experiment: add region to cache_tree_fully_valid()
77: fc35180 ! 79: 419f494 trace2:gvfs:experiment: add unpack_entry() counter to unpack_trees() and report_tracking()
78: 5b43294 = 80: 1f0c954 trace2:gvfs:experiment: increase default event depth for unpack-tree data
79: 550270b = 81: 1b3c2fa trace2:gvfs:experiment: add data for check_updates() in unpack_trees()
80: 5f2cad3 = 82: 1376bae Trace2:gvfs:experiment: capture more 'tracking' details
81: 045abab = 83: 9385e90 credential: set trace2_child_class for credential manager children
82: 6b0b19e = 84: 93d47e9 sub-process: do not borrow cmd pointer from caller
83: afba63d = 85: d0cb2df sub-process: add subprocess_start_argv()
84: b43ee7e = 86: 698c329 sha1-file: add function to update existing loose object cache
85: fd35572 = 87: 4d8ca07 packfile: add install_packed_git_and_mru()
86: d6eed3f = 88: 148e1be index-pack: avoid immediate object fetch while parsing packfile
87: b591116 = 89: 6794fcd gvfs-helper: create tool to fetch objects using the GVFS Protocol
88: 9e35331 = 90: e63abee sha1-file: create shared-cache directory if it doesn't exist
89: 5c03db5 = 91: 8bf10d0 gvfs-helper: better handling of network errors
90: 1a87dad = 92: 2b65125 gvfs-helper-client: properly update loose cache with fetched OID
91: 6a6ac0a = 93: e50c25a gvfs-helper: V2 robust retry and throttling
92: 27bb4f3 = 94: 400df45 gvfs-helper: expose gvfs/objects GET and POST semantics
93: 820a44f = 95: fa4f2fa gvfs-helper: dramatically reduce progress noise
94: 48b3f90 = 96: e3e0fe5 gvfs-helper-client.h: define struct object_id
95: 26027b9 = 97: 8c40f07 gvfs-helper: handle pack-file after single POST request
96: 7bd258b ! 98: 02f963d test-gvfs-prococol, t5799: tests for gvfs-helper
97: f142997 = 99: 0abb84b gvfs-helper: move result-list construction into install functions
98: 6ec53d2 = 100: 46a4267 t5799: add support for POST to return either a loose object or packfile
99: 89cd885 = 101: 2995649 t5799: cleanup wc-l and grep-c lines
100: 9033775 = 102: eab8fe9 gvfs-helper: verify loose objects after write
101: f67891f = 103: 8f772ef t7599: create corrupt blob test
102: 066b25f = 104: bf736ee gvfs-helper: add prefetch support
103: 4aee040 = 105: 26429b4 gvfs-helper: add prefetch .keep file for last packfile
104: be6e216 = 106: c76a40a gvfs-helper: do one read in my_copy_fd_len_tail()
105: 58576a0 = 107: 2719528 gvfs-helper: move content-type warning for prefetch packs
106: 9c79566 = 108: 5203695 fetch: use gvfs-helper prefetch under config
107: 3ad5ca2 = 109: c20bcae gvfs-helper: better support for concurrent packfile fetches
108: db1f44a = 110: aefd3fc remote-curl: do not call fetch-pack when using gvfs-helper
109: f0e01d8 = 111: 514990f fetch: reprepare packs before checking connectivity
110: e1797f8 = 112: 0e15de4 gvfs-helper: retry when creating temp files
111: 47ae2c0 = 113: 2102162 sparse: avoid warnings about known cURL issues in gvfs-helper.c
112: b7d9e59 = 114: b9aa44a gvfs-helper: add --max-retries to prefetch verb
113: cd615fc = 115: bc8ba2e t5799: add tests to detect corrupt pack/idx files in prefetch
114: eae3e03 = 116: 4d8e447 gvfs-helper: ignore .idx files in prefetch multi-part responses
115: d1d2404 = 117: 60ad684 t5799: explicitly test gvfs-helper --fallback and --no-fallback
116: 560038e = 118: d8d9dd5 gvfs-helper: don't fallback with new config
117: 48e648e = 119: 2c299fb test-gvfs-protocol: add cache_http_503 to mayhem
118: aed4b59 ! 120: d59167b maintenance: care about gvfs.sharedCache config
120: dce178a = 121: c8627b3 unpack-trees:virtualfilesystem: Improve efficiency of clear_ce_flags
121: 6df4ecf = 122: f498aa4 homebrew: add GitHub workflow to release Cask
122: fa41711 ! 123: cdd4365 Adding winget workflows
123: 471073e = 124: 402335f Disable the
monitor-components
workflow in msft-git124: b78d7f5 = 125: dfb2c86 .github: enable windows builds on microsoft fork
125: afd8701 ! 126: dfdb911 release: create initial Windows installer build workflow
119: 08a2ba3 = 127: 9e40834 t5799: add unit tests for new
gvfs.fallback
config setting126: f7c6f59 ! 128: 575be06 help: special-case HOST_CPU
universal
127: 3b744e9 = 129: d920e0f release: add Mac OSX installer build
128: e5d09aa ! 130: 76b340d release: build unsigned Ubuntu .deb package
129: 59e8eaf = 131: b041e49 release: add signing step for .deb package
130: 799f8c1 = 132: a21d2b3 release: create draft GitHub release with packages & installers
131: 7a8a309 = 133: 554506e build-git-installers: publish gpg public key
132: ca60df2 ! 134: 62038e9 release: continue pestering until user upgrades
133: af7f735 ! 135: 1a2f6dd Makefile: allow specifying GIT_BUILT_FROM_COMMIT
134: 7bb260f = 136: 376d7a5 dist: archive HEAD instead of HEAD^{tree}
135: b514a26 = 137: 421d974 release: include GIT_BUILT_FROM_COMMIT in MacOS build
136: 521148d = 138: 4bd7c00 release: add installer validation
137: 3f13abc = 139: cc84fe4 update-microsoft-git: create barebones builtin
138: 3f702db = 140: 227c405 update-microsoft-git: Windows implementation
139: 03dd074 = 141: 6b7077c update-microsoft-git: use brew on macOS
140: 1b5f1e0 = 142: cc60983 .github: update ISSUE_TEMPLATE.md for microsoft/git
141: 7f5f680 = 143: 8094a1c .github: update PULL_REQUEST_TEMPLATE.md
144: aaea46f ! 144: 184da00 scalar: set the config write-lock timeout to 150ms
145: 02b908b = 145: 3b8d6f7 scalar: add docs from microsoft/scalar
142: 970419c = 146: 97c8df8 Adjust README.md for microsoft/git
146: 5c9aac4 = 147: 3505038 scalar (Windows): use forward slashes as directory separators
147: 41359e0 = 148: 7cd0f55 scalar: add retry logic to run_git()
148: 17d44fa = 149: 70d000c scalar: support the
config
command for backwards compatibility149: 51463f1 = 150: 462d0ca scalar: implement a minimal JSON parser
150: e7d5631 = 151: e632525 scalar clone: support GVFS-enabled remote repositories
151: 879d88c = 152: 2a054c4 test-gvfs-protocol: also serve smart protocol
152: c3c75c6 = 153: d6058cb gvfs-helper: add the
endpoint
command153: 4b57081 = 154: f69ef17 dir_inside_of(): handle directory separators correctly
154: ff012a6 = 155: c8e72b9 scalar: disable authentication in unattended mode
155: d6ef324 = 156: 9e1206a scalar: do initialize
gvfs.sharedCache
156: 4df3f9a = 157: e522659 scalar diagnose: include shared cache info
157: 9729c3b = 158: d8c36e2 scalar: only try GVFS protocol on https:// URLs
158: 7158e01 = 159: 95300a1 scalar: verify that we can use a GVFS-enabled repository
159: 2000e58 = 160: 0a1bf8b scalar: add the
cache-server
command160: 28e175f = 161: b74b4ad scalar: add a test toggle to skip accessing the vsts/info endpoint
161: 8458698 = 162: 1be68af scalar: adjust documentation to the microsoft/git fork
162: 519e4f1 = 163: 59890cd scalar: enable untracked cache unconditionally
163: bbfb4ca = 164: 7774873 scalar: parse
clone --no-fetch-commits-and-trees
for backwards compatibility164: 12a32cd = 165: f2e828b scalar: make GVFS Protocol a forced choice
165: 7f41222 = 166: 03536af scalar diagnose: accommodate Scalar's Functional Tests
166: 81ca098 = 167: f749718 ci: run Scalar's Functional Tests
167: ee37a11 = 168: bf1fb74 scalar: upgrade to newest FSMonitor config setting
168: 2228c5d = 169: 1fa6f31 abspath: make strip_last_path_component() global
169: 9137831 = 170: bbbf936 scalar: configure maintenance during 'reconfigure'
170: 00db4c4 = 171: 98f245d scalar: .scalarCache should live above enlistment
171: 173e358 = 172: 55376f6 add/rm: allow adding sparse entries when virtual
172: aa48677 ! 173: a858ee7 sparse-checkout: add config to disable deleting dirs
173: a579f1e ! 174: 578f0f3 diff: ignore sparse paths in diffstat
174: 8e80d9a = 175: 7637da5 repo-settings: enable sparse index by default
175: 874de28 = 176: 85faeff diff(sparse-index): verify with partially-sparse
176: e85d52d = 177: 8bd1fdd stash: expand testing for
git stash -u
178: 20df0a9 = 178: 5277a88 sparse: add vfs-specific precautions
179: 834dec7 = 179: 0f1dbda reset: fix mixed reset when using virtual filesystem
180: d856bd5 = 180: 73a82df sparse-index: add ensure_full_index_with_reason()
181: 6be537d = 181: f82a5bd treewide: add reasons for expanding index
182: 5872e4d = 182: 246c5bd treewide: custom reasons for expanding index
183: fd4b610 = 183: 1eef26d sparse-index: add macro for unaudited expansions
184: 46e051c = 184: cfb455b Docs: update sparse index plan with logging
185: 4736f84 = 185: 515ec6c sparse-index: log failure to clear skip-worktree
186: b4b7779 = 186: 5186d9a stash: use -f in checkout-index child process
187: dcd9c62 = 187: 42479d4 sparse-index: do not copy hashtables during expansion
188: 1c3598c = 188: c8e900b t5616: mark tests as bogus with --path-walk
189: fa969ca = 189: 3e4a57c path-walk: add new 'edge_aggressive' option
190: 7012687 = 190: be555f9 pack-objects: allow --shallow and --path-walk
191: 46bb3c9 = 191: 971b882 t5538: add test to confirm deltas in shallow pushes
192: a98c3d7 < -: ----------- fixup! Adding winget workflows
193: 4760f28 = 192: 39fdcb0 gitk: check main window visibility before waiting for it to show
194: b66d8e9 < -: ----------- fixup! release: create initial Windows installer build workflow
195: a68ad02 < -: ----------- fixup! release: build unsigned Ubuntu .deb package