Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy fetchTree outPath path values #10252

Closed
wants to merge 10 commits into from

Conversation

roberth
Copy link
Member

@roberth roberth commented Mar 15, 2024

Motivation

Improve performance, and make the fetchTree interface more capable while keeping it clean.

Description

This makes fetchTree return lazy InputAccessor-based SourcePaths instead of "cowardly" fetching them to the store and returning absolute "system" paths.
It stays close to existing path semantics, including support for readFile "${toString p}/..", which some expressions rely on.
It does not go as far as lazy-trees, but judging from the amount of change I could reuse, and how little of my own I had to add, lazy-trees will be a natural extension of this PR.

Done:

  • Packages and NixOS evaluate as usual
    • no hash changes, based on my limited testing
  • Flake is not added to store unless e.g.
    • "${flake.outPath}"
    • uses module system (needs clever lazy source strings, or a change to the module system)
  • Note that the above is already an improvement over the status quo - always fetching to the store
  • iirc 0.1s reduction on nixpkgs#hello, and 1s reduction on nixosTests.simple

Conclusion so far:
Viable

TODO:

  • Check the TODO and FIXMEs
  • Update the test suite
    • probably some intended behavior change, such as removal of narHash in some observable places
    • possibly finds a bug
  • Check performance again
  • Cherry-pick the toString path behavior to lazy-trees

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority fetching Networking with the outside (non-Nix) world, input locking labels Mar 15, 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are close to lazy-trees.

void EvalState::registerAccessor(const ref<InputAccessor> accessor)
{
inputAccessors.push_back(accessor);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like lazy-trees, except we only do it so that we don't destroy them when we put a non-smart pointer in Value, which has no finalizer because of GC and performance.

@@ -1973,6 +1978,17 @@ void EvalState::concatLists(Value & v, size_t nrLists, Value * * lists, const Po
}
}

// FIXME limit recursion
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known issue, solve later with #10240

Comment on lines +2030 to +2055
Value & vTmp0 = *vTmpP++;
i->eval(state, env, vTmp0);
Value & vTmp = *resolveOutPath(state, &vTmp0, i_pos);

/* If the first element is a path, then the result will also
be a path, we don't copy anything (yet - that's done later,
since paths are copied when they are used in a derivation),
and none of the strings are allowed to have contexts. */
if (first) {
firstType = vTmp.type();
if (firstType == nPath) {
accessor = vTmp.path().accessor;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar but not equal to lazy-trees.

  • I've kept the diff smaller by keeping vTmp as a reference
  • Not adding the first name to the path, because it is added again later. Probably a bug in lazy-trees.

Comment on lines +2334 to +2348
: v.path().accessor->toStringReturnsStorePath()
? store->printStorePath(copyPathToStore(context, SourcePath(v.path().accessor, CanonPath::root))) + v.path().path.absOrEmpty()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new, in order for readFile "${toString ./.}/.." to work, just as it did before.
Would not recommend to write that, but similar usages of paths exist in the wild.

Comment on lines +2451 to +2467
auto i = v.attrs->find(sOutPath);
if (i != v.attrs->end()) {
return coerceToPath(pos, *i->value, context, errorCtx);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is not new. Previously, it would have worked by falling through to the coerceToString + rootPath code down below.

v.mkPath(
&*path.accessor,
// TODO: GC_STRDUP
strdup(path.path.abs().c_str()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like lazy-trees, but I had to add strdup to avoid corruption.

@@ -201,18 +201,31 @@ static void fetchTree(

state.checkURI(input.toURLString());

auto [storePath, input2] = input.fetchToStore(state.store);
if (params.returnPath) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like #10225, but clang was lagging behind GCC's C++20.

* In both cases, the returned string functionally identifies the path,
* and can still be read.
*/
virtual bool toStringReturnsStorePath() const;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only truly new semantics.

Either way, both kinds of paths are virtual in the sense that they haven't been copied yet.
If you have copied it, it'd be a string. (Not if and only if, although Nix does discourage that.)

In lazy-trees, this could be changed to a virtual store path string without a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: forward port this rule to prove it.

Copy link
Member

@Ericson2314 Ericson2314 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10511 (comment) I think rather contain information with which to construct the path, if I understand what is going on here correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ericson2314 these are solutions to different problems.

  • Here I am adding a property to distinguish the behavior between "system" path values and virtual path values in the language.
  • Linked comment seems to be about optimizing away a copy operation, which iiuc is internal to a fetcher and not perceptible by users.

Both solutions (to different problems!) are trying to solve concerns in the upper layers though.

I'll check if this one can be moved into the evaluator. We could probably just special case the system (ie posix accessor) paths there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, here I've picked the least breaking semantics:

  • The return value still uniquely identifies the source
  • Still deterministic
  • Still readable when converted to a system path with /. + x or /${x}

Performance is at least as good as the status quo, but not as good as lazy trees, because for toString x to be instant, you need to sacrifice one of the above, or come up with a clever scheme, like opaque placeholders in strings or something.

@ConnorBaker
Copy link
Contributor

@roberth is there anything I (or someone in general) could do to try to help move this forward? Thank you for working on this :)

@tomberek tomberek added idea approved The given proposal has been discussed and approved by the Nix team. An implementation is welcome. and removed idea approved The given proposal has been discussed and approved by the Nix team. An implementation is welcome. labels Apr 8, 2024
@roberth roberth force-pushed the lazy-fetchTree branch 2 times, most recently from a29ade5 to 8cb0fd4 Compare April 16, 2024 13:54
edolstra and others added 8 commits April 16, 2024 15:55
This picks a number of changes from the lazy-trees branch.
As it is hand picked, and does not include some other necessary
changes, it does not build. Subsequent commits will fix that.

I have added a couple of comments of my own as well.

Co-authored-by: Eelco Dolstra <[email protected]>
This fixes the double copy problem and improves performance
for expressions that don't force the whole source to be added to the
store.

Rules for fast expressions:

- Use path literals where possible
   - import ./foo.nix
- Use + operator with slash in string
   - src = fetchTree foo + "/src";
- Use source filtering, lib.fileset

- AVOID toString
- If possible, AVOID interpolations ("${./.}")
- If possible, move slashes into the interpolation to add less to the store
   - "${./src}/foo" -> "${./src/foo}"

toString may be improved later as part of lazy-trees, so these
recommendations are a snapshot. Path values are quite nice though.
This allows clever editors/IDEs to discern the path more easily
for Ctrl+Click navigate to functionality, e.g. when building
.?ref=HEAD
This showPath is getting a little too ad hoc, but it works for now.
@roberth
Copy link
Member Author

roberth commented Apr 16, 2024

@roberth is there anything I (or someone in general) could do to try to help move this forward? Thank you for working on this :)

Hi @ConnorBaker,

Sorry for the late response; I had to take another look at this first, and it took a while to get around to it.
I've rebased the branch, but I seem to have broken the search path somehow.

I see two ways forward, either

  • re-do the PR with reduced scope
    • don't remove the narHash, if possible; should keep more flakes code the same
    • carefully look whether other changes were absolutely necessary
  • keep going; just fix the remaining test failures, and also remove the commit delay the flake outPath semantics change for now

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-04-29-nix-team-meeting-minutes-142/45020/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-07-15-nix-team-meeting-minutes-161/49228/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-08-07-nix-team-meeting-minutes-167/50287/1

@roberth roberth self-assigned this Aug 7, 2024
@tomberek
Copy link
Contributor

tomberek commented Aug 8, 2024

Reverting the last commit (4332b9a) gives us the following comparison with vanilla Nix:

Before

$ nix eval .#data --no-eval-cache | nixfmt
warning: Git tree '/home/tom/nix/t' is dirty
{
  fetchTree = "/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source";
  fetchTreePath = "/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source/ci";
  fetchTreePathStr = "/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source/ci";
  originalStr = "/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source";
  outPathRaw = "/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source";
  outPathStr = "/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source";
  pathRaw = /nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source;
  pathRawAdd = /nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source/ci;
  pathRawAddStr = "/nix/store/z0qs96vamg1r2ch0rml9pmsn8f002hvw-ci";
  pathStr = "/nix/store/rslrjkrdgd2ggxmlyckc53nv0pxjq5qj-3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source";
}
$ nix eval .#data --no-eval-cache -vvv |& grep copying.*-source
...
copying '/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source/pkgs/development/libraries/glibc/nix-nss-open-files.patch' to the store...
copying '/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source/pkgs/development/libraries/glibc/0001-Revert-Remove-all-usage-of-BASH-or-BASH-in-installed.patch' to the store...
copying '/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source/pkgs/development/libraries/glibc/reenable_DT_HASH.patch' to the store...
copying '/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source/ci' to the store...
copying '/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source' to the store...

After (only the differences)

This has fewer copies to the store, going from 6s to 0.6s!

$ nix eval .#data --no-eval-cache | tr -cd '[[:print:]]' |nixfmt      # because of "»" characters
{
  fetchTree = "github:NixOS/nixpkgs/086a5ea5b3acc4c512f9ec154bfefba55efba4f3?narHash=sha256-LyZtQZiq2v2We5ODev6s9s2iUHNu/ZC8rHIYRh1BIzg%3D:";
  fetchTreePath = "github:NixOS/nixpkgs/086a5ea5b3acc4c512f9ec154bfefba55efba4f3?narHash=sha256-LyZtQZiq2v2We5ODev6s9s2iUHNu/ZC8rHIYRh1BIzg%3D:ci";
  fetchTreePathStr = "/nix/store/z0qs96vamg1r2ch0rml9pmsn8f002hvw-ci";
  originalStr = ...
  outPathRaw = "github:NixOS/nixpkgs/086a5ea5b3acc4c512f9ec154bfefba55efba4f3?narHash=sha256-LyZtQZiq2v2We5ODev6s9s2iUHNu/ZC8rHIYRh1BIzg%3D:";
  outPathStr = ...
  pathRaw = "github:NixOS/nixpkgs/086a5ea5b3acc4c512f9ec154bfefba55efba4f3?narHash=sha256-LyZtQZiq2v2We5ODev6s9s2iUHNu/ZC8rHIYRh1BIzg%3D:";
  pathRawAdd = "github:NixOS/nixpkgs/086a5ea5b3acc4c512f9ec154bfefba55efba4f3?narHash=sha256-LyZtQZiq2v2We5ODev6s9s2iUHNu/ZC8rHIYRh1BIzg%3D:ci";
  pathRawAddStr = ...
  pathStr = "/nix/store/3a07hs5zz57xf76gqq53i9jkmi3mhyp6-source";
}

Source

{
  inputs.nixpkgs.url = "github:NixOS/nixpkgs";
  outputs = _: {
    data = rec {
      fetchTree = (builtins.fetchTree (builtins.fromJSON (builtins.readFile ./flake.lock)).nodes.nixpkgs.locked).outPath;
      fetchTreePath = fetchTree + "/ci";

      # original = _.nixpkgs;
      originalStr = "${_.nixpkgs}";
      outPathRaw = _.nixpkgs.outPath;
      outPathStr = "${_.nixpkgs.outPath}";
      pathRaw = _.nixpkgs.legacyPackages.x86_64-linux.path;
      pathStr = "${_.nixpkgs.legacyPackages.x86_64-linux.path}";
      pathRawAdd = _.nixpkgs.legacyPackages.x86_64-linux.path + "/ci";
      pathRawAddStr = "${_.nixpkgs.legacyPackages.x86_64-linux.path + "/ci"}";
    };
  };
}

@tomberek
Copy link
Contributor

tomberek commented Aug 8, 2024

Noticed an eval failure when using lib.fileset compared to normal. This is because the value retains a context that it otherwise loses.

A more critical location this happens is in lib.fileset (@infinisil ): A dynamic attribute name: https://github.com/NixOS/nixpkgs/blob/master/lib/fileset/internal.nix#L249

  • looks like a path is implicitly converted to a string here, but retains a context

A minimal repro:

[tom@tframe:~/nix/t2]$ ~/.nix-profile-new/bin/nix eval .#value
{ gv00g760bh9xa2kp42z1c2wcv91p7yhy-source = 1; }

[tom@tframe:~/nix/t2]$ ../outputs/out/bin/nix eval .#value
error:
       … while evaluating the name of a dynamic attribute
         at flake.nix:4:7:
            3|     value = {
            4|       "${builtins.baseNameOf ./.}" = 1;
             |       ^
            5|     };

       error: the string 'gv00g760bh9xa2kp42z1c2wcv91p7yhy-source' is not allowed to refer to a store path (such as 'gv00g760bh9xa2kp42z1c2wcv91p7yhy-source')

[tom@tframe:~/nix/t2]$ cat flake.nix
{
  outputs = {...}: {
    value = {
      "${builtins.baseNameOf ./.}" = 1;
    };
  };
}

Perhaps this does the wrong thing when coerceToString happens regarding copyToStore or perhaps we need special handling the lazy path type? (https://github.com/NixOS/nix/blob/cfe66dbec325d5dcb601b642bd9c149ae1353147/src/libexpr-c/nix_api_external.cc#L108C48-L108C59)

@roberth
Copy link
Member Author

roberth commented Aug 8, 2024

while evaluating the name of a dynamic attribute
         at flake.nix:4:7:
            3|     value = {
            4|       "${builtins.baseNameOf ./.}" = 1;
             |       ^
            5|     };

Yeah that looks like baseNameOf is performing a conversion to string, which is unnecessary. I've pushed a fix.
dirOf seems to be ok, but readFile has logic like state.store->isInStore(path.path.abs()) which does not account for the possibly non-system accessor. realisePath returns path values unchanged, so that is a thing now.
It looks like the primops need to be audited for how they handle the .path field where they do.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-08-21-nix-team-meeting-minutes-171/50950/1

@nyabinary
Copy link

The performance label should probably be added to this, right?

tomberek pushed a commit to tomberek/nix that referenced this pull request Aug 23, 2024
tomberek pushed a commit to tomberek/nix that referenced this pull request Aug 24, 2024
tomberek pushed a commit to tomberek/nix that referenced this pull request Aug 24, 2024
tomberek pushed a commit to tomberek/nix that referenced this pull request Aug 24, 2024
@roberth roberth mentioned this pull request Aug 25, 2024
6 tasks
@roberth
Copy link
Member Author

roberth commented Aug 25, 2024

We're turning this into a collaborative effort at #11367 (backed by an upstream branch for easy pushing)

@roberth roberth closed this Aug 25, 2024
tomberek added a commit to tomberek/nix that referenced this pull request Nov 3, 2024
Co-authored-by: Eelco Dolstra <[email protected]>
Co-authored-by: Robert Hensing <[email protected]>

fixup: remove FlakeCache

Make path values lazy

This fixes the double copy problem and improves performance
for expressions that don't force the whole source to be added to the
store.

Rules for fast expressions:

- Use path literals where possible
   - import ./foo.nix
- Use + operator with slash in string
   - src = fetchTree foo + "/src";
- Use source filtering, lib.fileset

- AVOID toString
- If possible, AVOID interpolations ("${./.}")
- If possible, move slashes into the interpolation to add less to the store
   - "${./src}/foo" -> "${./src/foo}"

toString may be improved later as part of lazy-trees, so these
recommendations are a snapshot. Path values are quite nice though.

SourceAccessor: insert colon after prefix

This allows clever editors/IDEs to discern the path more easily
for Ctrl+Click navigate to functionality, e.g. when building
.?ref=HEAD

Fix evalState::rootFS paths' to_string()

This showPath is getting a little too ad hoc, but it works for now.

Fix nix flake init eval for path value

WIP fix baseNameOf (needs test maybe)

NixOS#10252 (comment)

fix findFile assertion failure

A string is only allowed to create one path component; containing
no slashes.

tests: nix:derivation-internal.nix renders with a scheme now

fixup: Re-enable import .drv code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command performance with-tests Issues related to testing. PRs with tests have some priority
Projects
Status: Done
Archived in project
Development

Successfully merging this pull request may close these issues.

Make builtins.fetchTree return a path as its outPath element
8 participants