onnxruntime-gpu not working with my gpu / setup #21215

jillson · 2024-06-30T19:27:20Z

Describe the issue

Configuration:
RTX 3050 (Laptop). nvidia-smi indicates I'm using 555.85 as my driver, and CUDA 12.5, which is slightly confusing in as much as I uninstalled CUDA 12.5 and installed 12.1 based on version compatibility indicated.
CUDNN 8.9.7.29 is installed but doesn't appear to be used
Using pytorch (torch 2.1.2+cu121), onnx 1.16.1, and onnxruntime-gpu 1.18.1

When I try to run the necked down code (see below), I get an error about not being able to load CUDA
024-06-30 15:21:47.1727155 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 onnxruntime::TryGetProviderInfo_CUDA] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1426 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "(snip)\venv\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

To reproduce

Activate my virtualenvironment (which has the versions listed above) and then ran:

import onnxruntime as ort

providers = [("CUDAExecutionProvider", {"device_id": torch.cuda.current_device(),
                                        "user_compute_stream": str(torch.cuda.current_stream().cuda_stream)})]
sess_options = ort.SessionOptions()

import os
import psutil
p = psutil.Process(os.getpid())
for lib in p.memory_maps():
   print(lib.path)

model_path = "./venv/Lib/site-packages/onnx/backend/test/data/node/test_simple_rnn_batchwise/model.onnx"
try:
   sess = ort.InferenceSession(model_path, sess_options=sess_options, providers=providers)
except:
   pass

I get back the error above and also a list of DLLs etc. loaded; these include torch's cudnn (and zlib) [and not the nvidia cuda/cudnn files I installed; at least one stackoverflow post indicated with pytorch I'm using, shouldn't need those as they've helpfully baked it in; I've tried unsetting the CUDNN/CUDA environment variables and removing from $PATH and had same behavior.

Urgency

Very low

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.1 (or trying to; may be somehow still using CUDA 12.5)

The text was updated successfully, but these errors were encountered:

jillson · 2024-06-30T19:31:33Z

Minor update: https://stackoverflow.com/a/53504578 would seem to indicate my nvdia-smi behavior is expected (as in: I do in fact have 12.1 installed (nvcc --version returns 12.1) but I have the latest (or at least newer) driver that supports up to 12.5 ... given I'm using pytorch 2.1.2+cu121, going to assume I'm effectively running CUDA 12.1 for purposes here)

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

mszhanyi · 2024-07-01T02:05:20Z

Could you use dependency walker to load onnxruntime_providers_cuda.dll and take a look which dependent dll is missing?

tianleiwu · 2024-07-01T03:03:00Z

1.18.1 for cuda 12 requires cudnn 9.* instead of 8.*. See release note: https://github.com/microsoft/onnxruntime/releases/tag/v1.18.1

snnn · 2024-07-01T16:18:16Z

And python does not use PATH env for searching DLLs

jillson · 2024-07-01T22:58:54Z

1.18.1 for cuda 12 requires cudnn 9.* instead of 8.*. See release note: https://github.com/microsoft/onnxruntime/releases/tag/v1.18.1

Hmm.... https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements apparently needs to be updated to distinguish 1.18.1 (which as you note requires cudnn 9) vs 1.18.0 which worked with cudnn 8.9. Currently trying to download torch 's nightly which I'm hoping will get me cudnn 9 (my attempt to overwrite torch's "vendored" 8.X dlls with the 9.X dlls I had downloaded went about as well as you'd expect) ... if that doesn't work, will likely try to roll back to 1.18.0 binary and see if that then gets things aligned.

Thanks for the reminder about python not using PATH for DLL finding.

jillson · 2024-07-02T00:42:18Z

Hmm... now getting OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\jills\git\stable-diffusion-webui\venv\lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies. for both the using latest nightly pytorch (which does have Cudnn 9 dlls ) and for, in a different venv, reverting to onnxruntime-gpu==1.18.0 ... at this point, likely best thing to do would be to reinstall the venv, but I'm going to wait until next week due to much slower internet this week.

NulliferBones · 2024-07-05T00:02:04Z

I'm also unable to use Cuda for reactor. I'm receiving the same error as OP

jillson · 2024-07-10T12:12:16Z

Switching to torch for cudnn 8.9 / onxxruntime-gpu==1.18.0 my simple example now works .... but I'm still getting FAIL : LoadLibrary failed with error 126 when trying to load onnxruntime_providers_cuda.dll in stable diffusion which is what I care about...

jillson · 2024-07-10T12:50:49Z

And looking more closely at the error, somehow my virtualenv has gotten dorked up under git bash, leading to it finding (stale) dll's in global python over the ones in the venv; switching to powershell/cmd and that seems to now finally use cuda for stable diffusion (or at least not throw errors and fall back to CPU ... but I'm still seeing it take way longer than I'd like to run AND perfmon indicates 0% GPU utilization ... sigh )

github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform labels Jun 30, 2024

jillson closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnxruntime-gpu not working with my gpu / setup #21215

onnxruntime-gpu not working with my gpu / setup #21215

jillson commented Jun 30, 2024

jillson commented Jun 30, 2024

mszhanyi commented Jul 1, 2024

tianleiwu commented Jul 1, 2024

snnn commented Jul 1, 2024

jillson commented Jul 1, 2024

jillson commented Jul 2, 2024

NulliferBones commented Jul 5, 2024

jillson commented Jul 10, 2024

jillson commented Jul 10, 2024 •

edited

Loading

onnxruntime-gpu not working with my gpu / setup #21215

onnxruntime-gpu not working with my gpu / setup #21215

Comments

jillson commented Jun 30, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

jillson commented Jun 30, 2024

mszhanyi commented Jul 1, 2024

tianleiwu commented Jul 1, 2024

snnn commented Jul 1, 2024

jillson commented Jul 1, 2024

jillson commented Jul 2, 2024

NulliferBones commented Jul 5, 2024

jillson commented Jul 10, 2024

jillson commented Jul 10, 2024 • edited Loading

jillson commented Jul 10, 2024 •

edited

Loading