Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textgen-nvidia build fails on NixOS-WSL #59

Open
alexvorobiev opened this issue Oct 29, 2023 · 2 comments
Open

textgen-nvidia build fails on NixOS-WSL #59

alexvorobiev opened this issue Oct 29, 2023 · 2 comments

Comments

@alexvorobiev
Copy link

I updated the flake and now I am getting this error for textgen-nvidia:

error: builder for '/nix/store/900bmg4iknf0yb7r1b3f5xdfarqc9yzy-triton-llvm-14.0.6-f28c006a5895.drv' failed with exit code 1;
       last 10 log lines:
       > In file included from /build/source/llvm/include/llvm/Support/YAMLTraits.h:23,
       >                  from /build/source/llvm/include/llvm/CodeGen/MIRYamlMapping.h:22,
       >                  from /build/source/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h:21,
       >                  from /build/source/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h:18:
       > /build/source/llvm/include/llvm/Support/SourceMgr.h: In member function ‘bool llvm::SMFixIt::operator<(const llvm::SMFixIt&) const’:
       > /build/source/llvm/include/llvm/Support/SourceMgr.h:241: note: ‘-Wmisleading-indentation’ is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
       >   241 |     if (Range.Start.getPointer() != Other.Range.Start.getPointer())
       >       |
       > /build/source/llvm/include/llvm/Support/SourceMgr.h:241: note: adding ‘-flarge-source-files’ will allow for more column-tracking support, at the expense of compilation time and memory
       > ninja: build stopped: subcommand failed.
       For full logs, run 'nix log /nix/store/900bmg4iknf0yb7r1b3f5xdfarqc9yzy-triton-llvm-14.0.6-f28c006a5895.drv'.
error: 1 dependencies of derivation '/nix/store/0xf7hi05hpx45khnbwvrhh1rxc5vc9j2-python3.11-triton-2.0.0.drv' failed to build
error (ignored): error: cannot unlink '/tmp/nix-build-nccl-2.18.5-1.drv-3': Directory not empty
error (ignored): error: cannot unlink '/tmp/nix-build-magma-2.7.2.drv-1': Directory not empty
error: 1 dependencies of derivation '/nix/store/vr6knfixvhazw998iqz207dr99ffhbv7-python3-3.11.5-env.drv' failed to build
error: 1 dependencies of derivation '/nix/store/9wbs0ybrpkc81b0x26wrsdhb7c86iqa2-textgen.drv' failed to build
@MatthewCroughan
Copy link
Member

@alexvorobiev please check the dmesg command to see what the Linux kernel thinks about the situation, are you sure you're not just running out of memory during the build, causing something to get killed, causing the spurious error?

Can you also post more logs, you have truncated the logs so it does not show the full story.

@alexvorobiev
Copy link
Author

Yes, the issue was with the memory - I increased the amount of memory given to WSL to 30G (from 16G) and I also had to reduce the number of parallel jobs nix used to build using --core. It was using all 24 cores available in my CPU, I randomly tried 8 and it completed. I haven't tried loading any models yet. I was surprised that it wants to build the whole world from scratch (specifically torch and triton-llvm), shouldn't the packages be in the binary cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants