Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cherry-pick/trace2-missing-def-param-fix #633

Merged
merged 17 commits into from
Mar 21, 2024

Conversation

jeffhostetler
Copy link

Cherry-pick jh/trace2-missing-def-param-fix from core Git.
6111252 (gitster/jh/trace2-missing-def-param-fix)

(This follows #632 and this PR currently also shows the commits from 632. That should change once 632 has been merged.)

index_dir_exists() returns a boolean to indicate if there is a
case-insensitive match in the directory name-hash, but does not
provide the caller with the exact spelling of that match.

Create index_dir_find() to do the case-insensitive search *and*
optionally return the spelling of the matched directory prefix in a
provided strbuf.

To avoid code duplication, convert index_dir_exists() to be a trivial
wrapper around the new index_dir_find().

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
The FSMonitor client code trusts the spelling of the pathnames in the
FSEvents received from the FSMonitor daemon.  On case-insensitive file
systems, these OBSERVED pathnames may be spelled differently than the
EXPECTED pathnames listed in the .git/index.  This causes a miss when
using `index_name_pos()` which expects the given case to be correct.

When this happens, the FSMonitor client code does not update the state
of the CE_FSMONITOR_VALID bit when refreshing the index (and before
starting to scan the worktree).

This results in modified files NOT being reported by `git status` when
there is a discrepancy in the case-spelling of a tracked file's
pathname.

This commit contains a (rather contrived) test case to demonstrate
this.  A later commit in this series will update the FSMonitor client
code to recognize these discrepancies and update the CE_ bit accordingly.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Move the code to handle directory FSEvents (containing pathnames with
a trailing slash) into a helper function.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Improve documentation of the refresh callback helper function
used for directory FSEvents.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Move the code that handles unqualified FSEvents (without a trailing
slash) into a helper function.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Create a wrapper function for untracked_cache_invalidate_path()
that silently trims a trailing slash, if present, before calling
the wrapped function.

The untracked cache expects to be called with a pathname that
does not contain a trailing slash.  This can make it inconvenient
for callers that have a directory path.  Lets hide this complexity.

This will be used by a later commit in the FSMonitor code which
may receive directory pathnames from an FSEvent.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Update fsmonitor_refresh_callback() to use the new
untracked_cache_invalidate_trimmed_path() to invalidate
the cache using the observed pathname without needing to
modify the caller's buffer.

Previously, we modified the caller's buffer when the observed pathname
contained a trailing slash (and did not restore it).  This wasn't a
problem for the single use-case caller, but felt dirty nontheless.  In
a later commit we will want to invalidate case-corrected versions of
the pathname (using possibly borrowed pathnames from the name-hash or
dir-name-hash) and we may not want to keep the tradition of altering
the passed-in pathname.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Move the call to invalidate the untracked-cache for the FSEvent
pathname into the two helper functions.

In a later commit in this series, we will call these helpers
from other contexts and it safer to include the UC invalidation
in the helpers than to remember to also add it to each helper
call-site.

This has the side-effect of invalidating the UC *before* we
invalidate the ce_flags in the cache-entry.  These activities
are independent and do not affect each other.  Also, by doing
the UC work first, we can avoid worrying about "early returns"
or the need for the usual "goto the end" in each of the
handler functions.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Teach the refresh callback helper function for directory FSEvents to
return the number of cache-entries that were invalidated in response
to a directory event.

This will be used in a later commit to help determine if the observed
pathname in the FSEvent was a (possibly) case-incorrect directory
prefix (on a case-insensitive filesystem) of one or more actual
cache-entries.

If there exists at least one case-insensitive prefix match, then we
can assume that the directory is a (case-incorrect) prefix of at least
one tracked item rather than a completely unknown/untracked file or
directory.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Refactor the code that handles refresh events for pathnames that do
not contain a trailing slash.  Instead of using a custom loop to try
to scan the index and detect if the FSEvent named a file or might be a
directory prefix, use the recently created helper function to do that.

Also update the comments to describe what and why we are doing this.

On platforms that DO NOT annotate FS events with a trailing
slash, if we fail to find an exact match for the pathname
in the index, we do not know if the pathname represents a
directory or simply an untracked file.  Pretend that the pathname
is a directory and try again before assuming it is an untracked
file.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Teach the refresh callback helper function for unqualified FSEvents
(pathnames without a trailing slash) to return the number of
cache-entries that were invalided in response to the event.

This will be used in a later commit to help determine if the observed
pathname was (possibly) case-incorrect when (on a case-insensitive
file system).

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Consolidate the directory/non-directory calls to the refresh handler
code.  Log the resulting count of invalidated cache-entries.

The nr_in_cone value will be used in a later commit to decide if
we also need to try to do case-insensitive lookups.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Refactor code in the fsmonitor_refresh_callback() call chain dealing
with invalidating the CE_FSMONITOR_VALID bit and add a trace message.

During the refresh, we clear the CE_FSMONITOR_VALID bit in response to
data from the FSMonitor daemon (so that a later phase will lstat() and
verify the true state of the file).

Create a new function to clear the bit and add some unique tracing for
it to help debug edge cases.

This is similar to the existing `mark_fsmonitor_invalid()` function,
but it also does untracked-cache invalidation and we've already
handled that in the refresh-callback handlers, so but we don't need
to repeat that.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Teach fsmonitor_refresh_callback() to handle case-insensitive
lookups if case-sensitive lookups fail on case-insensitive systems.
This can cause 'git status' to report stale status for files if there
are case issues/errors in the worktree.

The FSMonitor daemon sends FSEvents using the observed spelling
of each pathname.  On case-insensitive file systems this may be
different than the expected case spelling.

The existing code uses index_name_pos() to find the cache-entry for
the pathname in the FSEvent and clear the CE_FSMONITOR_VALID bit so
that the worktree scan/index refresh will revisit and revalidate the
path.

On a case-insensitive file system, the exact match lookup may fail
to find the associated cache-entry. This causes status to think that
the cached CE flags are correct and skip over the file.

Update event handling to optionally use the name-hash and dir-name-hash
if necessary.

Also update t7527 to convert the "test_expect_failure" to "_success"
now that we have fixed the bug.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Some Git commands fail to emit 'def_param' events for interesting
config and environment variable settings.

Add unit tests to demonstrate this.

Most commands are considered "builtin" and are based upon git.c.
These typically do emit 'def_param' events.  Exceptions are some of
the "query" commands, the "run-dashed" mechanism, and alias handling.

Commands built from remote-curl.c (instead of git.c), such as
"git-remote-https", do not emit 'def_param' events.

Likewise, "git-http-fetch" is built http-fetch.c and does not emit
them.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
During nested alias expansion it is possible for
"trace2_cmd_list_config()" and "trace2_cmd_list_env_vars()"
to be called more than once.  This causes a full set of
'def_param' events to be emitted each time.  Let's avoid
that.

Add code to those two functions to only emit them once.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Some commands do not cause a set of 'def_param' events to be emitted.
This includes "git-remote-https", "git-http-fetch", and various
"query" commands, like "git --man-path".

Since all of these commands do emit a 'cmd_name' event, add code to
the "trace2_cmd_name()" function to generate the set of 'def_param'
events.

Remove explicit calls to "trace2_cmd_list_config()" and
"trace2_cmd_list_env_vars()" in git.c since they are no longer needed.

Reviewed-by: Josh Steadmon <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
@jeffhostetler jeffhostetler self-assigned this Mar 21, 2024
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a well-done backport to me:

$ git range-diff 6111252cbf21abb175411da5c5a2cde65bb8f3e9~3..6111252cbf21abb175411da5c5a2cde65bb8f3e9 \
  microsoft/cherry-pick/trace2-missing-def-param-fix~3..microsoft/cherry-pick/trace2-missing-def-param-fix
  • 1: 0c1c3c8 = 1: 4e9f006 t0211: demonstrate missing 'def_param' events for certain commands

  • 2: 520cf66 = 2: cc026b2 trace2: avoid emitting 'def_param' set more than once

  • 3: 6111252 ! 3: 910c853 trace2: emit 'def_param' set with 'cmd_name' event

    @@ git.c: static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
     -	trace2_cmd_list_env_vars();
      
      	validate_cache_entries(the_repository->index);
    - 	status = p->fn(argc, argv, prefix);
    + 	exit_code = status = p->fn(argc, argv, prefix);
     
      ## t/t0211-trace2-perf.sh ##
     @@ t/t0211-trace2-perf.sh: test_expect_success 'expect def_params for normal builtin command' '

@jeffhostetler jeffhostetler merged commit 72e2b51 into vfs-2.44.0 Mar 21, 2024
83 of 85 checks passed
@jeffhostetler jeffhostetler deleted the cherry-pick/trace2-missing-def-param-fix branch March 21, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants