Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: Importing CUDA.jl under Nsight Systems exits the process #2546

Open
huiyuxie opened this issue Nov 7, 2024 · 8 comments
Open

Windows: Importing CUDA.jl under Nsight Systems exits the process #2546

huiyuxie opened this issue Nov 7, 2024 · 8 comments
Labels
upstream Somebody else's problem.

Comments

@huiyuxie
Copy link
Contributor

huiyuxie commented Nov 7, 2024

Describe the bug

When running using CUDA after launching Julia with Nsight Systems, the program quits, but a profiling report is still generated.

To reproduce

The Minimal Working Example (MWE) for this bug:

$ nsys launch julia
julia> using CUDA
Manifest.toml

Paste your Manifest.toml here, or accurately describe which version of CUDA.jl and its dependencies (GPUArrays.jl, GPUCompiler.jl, LLVM.jl) you are using.

Expected behavior

The program should remain running and waiting for the next instruction.

Version info

Details on Julia:

# please post the output of:
julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12a (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 13th Gen Intel(R) Core(TM) i9-13900H
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 20 virtual cores)

Details on CUDA:

# please post the output of:
julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.7
NVIDIA driver 565.90.0

CUDA libraries:
- CUBLAS: 12.6.3
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+565.90

Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA GeForce RTX 4060 Laptop GPU (sm_89, 7.773 GiB / 7.996 GiB available)

Additional context

Some warnings from the profiling report:

  | Injection | 20284 | 00:01.034 | Installed CUDA driver version (12.7) is not supported by this build of Nsight Systems. CUDA trace will be collected using libraries for driver version 12.6
  | Analysis | 1408 | 00:04.805 | NVTX profiling might have not been started correctly.
  | Analysis | 1408 | 00:04.805 | No NVTX events collected. Does the process use NVTX?
  | Analysis |   | 00:04.805 | CUDA profiling might have not been started correctly.
  | Analysis |   | 00:04.805 | No CUDA events collected. Does the process use CUDA?
  | Analysis | 3376 | 00:04.805 | CUDA profiling might have not been started correctly.
  | Analysis | 3376 | 00:04.805 | No CUDA events collected. Does the process use CUDA?
  | Analysis | 20284 | 00:04.805 | Number of CUDA events collected: 1.
  | Analysis | 24180 | 00:04.805 | CUDA profiling might have not been started correctly.
  | Analysis | 24180 | 00:04.805 | No CUDA events collected. Does the process use CUDA?
@huiyuxie huiyuxie added the bug Something isn't working label Nov 7, 2024
@huiyuxie
Copy link
Contributor Author

huiyuxie commented Nov 7, 2024

Could this be related to permission issues?

@maleadt
Copy link
Member

maleadt commented Nov 7, 2024

Hard to tell. Which version of Nsight are you using? Are you using juliaup, and if so, can you try launching the Julia binary directly?

@maleadt maleadt added the needs information Further information is requested label Nov 7, 2024
@huiyuxie
Copy link
Contributor Author

huiyuxie commented Nov 8, 2024

The latest one - Nsight Systems 2024.6.1. See

(base) PS C:\Users\huiyu> nsys --version
NVIDIA Nsight Systems version 2024.6.1.90-246134905481v0

Are you using juliaup, and if so, can you try launching the Julia binary directly?

Yes I tried but had the same problem - the profiling report generates right after the using CUDA command. I also tried profiling directly using the Nsight Systems UI on a simple CUDA kernel file, but the profiling report doesn’t seem to capture CUDA events.

Also, I ran everything as an administrator, so it’s likely not a permissions issue. Could this be related to NVTX.jl issue #37? I’m using Windows, and I always see NVTX profiling might not have started correctly in each report.

@huiyuxie
Copy link
Contributor Author

huiyuxie commented Nov 8, 2024

If yes I could offer some help with this issue.

@maleadt
Copy link
Member

maleadt commented Nov 8, 2024

I can confirm the issue, but I'm not sure what we can do here, as I don't have much debugging experience on Windows.
In any case, this looks like a bug with Nsight Systems. Could you file an issue with NVIDIA? They are generally pretty quick to get back to you.

@maleadt maleadt added upstream Somebody else's problem. and removed bug Something isn't working needs information Further information is requested labels Nov 8, 2024
@maleadt maleadt changed the title Unexpected crash using CUDA in Nsight Systems Windows: Importing CUDA.jl under Nsight Systems exits the process Nov 8, 2024
@huiyuxie
Copy link
Contributor Author

huiyuxie commented Nov 8, 2024

Sure

@huiyuxie
Copy link
Contributor Author

huiyuxie commented Nov 9, 2024

@huiyuxie
Copy link
Contributor Author

huiyuxie commented Nov 13, 2024

It is indeed related with the issue from NVTX.jl - it is just so wired that the issue has been found for a year but no similar issue was found like mine.

I will try to fix that issue in my spare time and I will cc you @maleadt once PR is opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Somebody else's problem.
Projects
None yet
Development

No branches or pull requests

2 participants