Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: better support for dangling revisions #11753

Open
wants to merge 1 commit into
base: 2.18-maintenance
Choose a base branch
from

Conversation

johnrichardrinehart
Copy link

Motivation

A lot of this is laid out in #11752. I kindly ask reviewers to look at that issue.

Context

Pleae see #11752

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the fetching Networking with the outside (non-Nix) world, input locking label Oct 26, 2024
@johnrichardrinehart johnrichardrinehart force-pushed the jrinehart/fix-support-for-dangling-revisions branch 2 times, most recently from 9b2c66d to ed4d553 Compare October 26, 2024 19:37
@johnrichardrinehart johnrichardrinehart force-pushed the jrinehart/fix-support-for-dangling-revisions branch 2 times, most recently from 28b0382 to 115ebfe Compare October 27, 2024 17:17
@@ -529,7 +529,13 @@ struct GitInputScheme : InputScheme
if (input.getRev()) {
try {
runProgram("git", true, { "-C", repoDir, "--git-dir", gitDir, "cat-file", "-e", input.getRev()->gitRev() });
doFetch = false;
auto res = runProgram("git", true, { "-C", repoDir, "--git-dir", gitDir, "for-each-ref", fmt("--contains=%s", input.getRev()->gitRev()) });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how clever git is about this combination of flags. A naive implementation of for-each, --contains would be slow when many refs need to be checked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on a repo with 45k refs on a branch with 377k commits and it took ~300ms. So, I thought I was good. But, your comment made me double-check against the linux kernel. That took 10m....... Something doesn't scale. I'll revert. It's not a huge deal if we don't do this, since we either have the rev (in which case we fetch the rev explicitly, like below) or we don't (in which case doFetch = true anyway).

@@ -630,7 +636,7 @@ struct GitInputScheme : InputScheme
// everything to ensure we get the rev.
Activity act(*logger, lvlTalkative, actUnknown, fmt("making temporary clone of '%s'", repoDir));
runProgram("git", true, { "-C", tmpDir, "fetch", "--quiet", "--force",
"--update-head-ok", "--", repoDir, "refs/*:refs/*" }, {}, true);
"--update-head-ok", "--", repoDir, "refs/*:refs/*", input.getRev()->gitRev() }, {}, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this cause the fetch to fail when the remote's uploadpack.allowReachableSHA1InWant is disabled?

Copy link
Author

@johnrichardrinehart johnrichardrinehart Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. There actually seem to be a few related options for configuring a git server and I'm not sure which combinations of them support fetching by rev. From here:

If the remote has enabled the options uploadpack.allowTipSHA1InWant, uploadpack.allowReachableSHA1InWant, or uploadpack.allowAnySHA1InWant, they may alternatively be 40-hex sha1s present on the remote.

But, I don't think remote configuration matters, here. In this case (regardless of the user-provided URL corresponding to either of a network-accessed repository or filesystem-accessed repository) the fetch, here, is against the fetcher cache whose config files have never specified (and don't even support, I think) any of these remote configuration values. Fetching by rev is supported according to my experimentation and the reproduction embedded in the commit message.

@johnrichardrinehart johnrichardrinehart force-pushed the jrinehart/fix-support-for-dangling-revisions branch 2 times, most recently from 1dc363c to dd279ac Compare November 5, 2024 07:06
src/libfetchers/git.cc Outdated Show resolved Hide resolved
The submodules branch generates a checkout which depends on only the
references, not all revisions. If there are dangling revisions in the
bare fetcher clone (a correctness issue itself) then the subsequent
temporary directory clone will disastrously fail with messages like
```
fatal: reference is not a tree: 2da0785fa32ce4c628d501c8743c88be00835b50
```
You should be able to reproduce this behavior with
```bash

this_nixpkgs='github:nixos/nixpkgs?rev=36ac8d7e411eeb11ac0998d5a39e329c1226e541';
this_git=$(nix build --print-out-paths "$this_nixpkgs#git")/bin/git;

export GIT_AUTHOR_NAME=foo \
	[email protected] \
	GIT_AUTHOR_DATE=100000000 \
	GIT_COMMITER_NAME=foo \
	[email protected] \
	GIT_COMMITTER_DATE=100000000;

tmpgit=$(mktemp -d);
echo "Repo'll be at $tmpgit";

"$this_git" init "$tmpgit";

"$this_git" -C "$tmpgit" commit --allow-empty -m "foo"; # 8c2146823865b76da067b3bb458611a0a19ede3b
"$this_git" -C "$tmpgit" commit --allow-empty -m "foo"; # 2da0785fa32ce4c628d501c8743c88be00835b50
"$this_git" -C "$tmpgit" reset --hard HEAD~1; # 8c2146823865b76da067b3bb458611a0a19ede3b

for version in 18 19 20 21 22 23 24; do
    echo "Checking nix version 2.$version";
    this_nix=$(nix build "$this_nixpkgs#nixVersions.nix_2_$version^out" --print-out-paths)/bin/nix;

    url="file:///$tmpgit";

    # populate fetcher-v1.sqlite and the gitv3/ dir
    "$this_nix-build" --expr $'builtins.fetchGit {
        url = \"'"$url"$'\";
        rev = \"8c2146823865b76da067b3bb458611a0a19ede3b\";
        submodules = true;
    }';

    # put the "corrupted" git dir in place of the fetched one
    cache_dir="${XDG_CONFIG_HOME:-$HOME}"/.cache/nix/gitv3;
    tmp=$(mktemp);
    printf "%s" "$url" >"$tmp";
    fetched_dir="$cache_dir"/$(printf "%s" "$url" | $this_nix --extra-experimental-features nix-command hash file --type sha256 --base32 "$tmp");
    rm -rf "$fetched_dir";
    git clone --bare "$tmpgit" "$fetched_dir";

    # See if we can realise the derivation pointing to the dangling commit
    _NIX_FORCE_HTTP=1 "$this_nix-build" --impure --expr $'
    { pkgs ? (import <nixpkgs> { }) }: pkgs.stdenv.mkDerivation {
      pname = "foo";
      version = "1.0.0";
      dontUnpack = true;

      '"
      src = builtins.fetchGit {
        url = \"file:///$tmpgit\";
        rev = \"2da0785fa32ce4c628d501c8743c88be00835b50\";
        submodules = true;
      };

      buildPhase = ''
        touch \$out;
        exit 0;
      '';
    }
    "
done
```
This solution attempts to avoid the issue in the submodules path by
simply checking out the specified revision (in addition to all
references to preserve the original behavior - although, I think _only_
the revision is necessary).
@johnrichardrinehart johnrichardrinehart force-pushed the jrinehart/fix-support-for-dangling-revisions branch from dd279ac to 591a826 Compare November 11, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetching Networking with the outside (non-Nix) world, input locking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants