-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onnxruntime-gpu not working with my gpu / setup #21215
Comments
Minor update: https://stackoverflow.com/a/53504578 would seem to indicate my nvdia-smi behavior is expected (as in: I do in fact have 12.1 installed (nvcc --version returns 12.1) but I have the latest (or at least newer) driver that supports up to 12.5 ... given I'm using pytorch 2.1.2+cu121, going to assume I'm effectively running CUDA 12.1 for purposes here)
|
Could you use dependency walker to load onnxruntime_providers_cuda.dll and take a look which dependent dll is missing? |
1.18.1 for cuda 12 requires cudnn 9.* instead of 8.*. See release note: https://github.com/microsoft/onnxruntime/releases/tag/v1.18.1 |
And python does not use PATH env for searching DLLs |
Hmm.... https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements apparently needs to be updated to distinguish 1.18.1 (which as you note requires cudnn 9) vs 1.18.0 which worked with cudnn 8.9. Currently trying to download torch 's nightly which I'm hoping will get me cudnn 9 (my attempt to overwrite torch's "vendored" 8.X dlls with the 9.X dlls I had downloaded went about as well as you'd expect) ... if that doesn't work, will likely try to roll back to 1.18.0 binary and see if that then gets things aligned. Thanks for the reminder about python not using PATH for DLL finding. |
Hmm... now getting OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\jills\git\stable-diffusion-webui\venv\lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies. for both the using latest nightly pytorch (which does have Cudnn 9 dlls ) and for, in a different venv, reverting to onnxruntime-gpu==1.18.0 ... at this point, likely best thing to do would be to reinstall the venv, but I'm going to wait until next week due to much slower internet this week. |
I'm also unable to use Cuda for reactor. I'm receiving the same error as OP |
Switching to torch for cudnn 8.9 / onxxruntime-gpu==1.18.0 my simple example now works .... but I'm still getting FAIL : LoadLibrary failed with error 126 when trying to load onnxruntime_providers_cuda.dll in stable diffusion which is what I care about... |
And looking more closely at the error, somehow my virtualenv has gotten dorked up under git bash, leading to it finding (stale) dll's in global python over the ones in the venv; switching to powershell/cmd and that seems to now finally use cuda for stable diffusion (or at least not throw errors and fall back to CPU ... but I'm still seeing it take way longer than I'd like to run AND perfmon indicates 0% GPU utilization ... sigh ) |
Describe the issue
Configuration:
RTX 3050 (Laptop). nvidia-smi indicates I'm using 555.85 as my driver, and CUDA 12.5, which is slightly confusing in as much as I uninstalled CUDA 12.5 and installed 12.1 based on version compatibility indicated.
CUDNN 8.9.7.29 is installed but doesn't appear to be used
Using pytorch (torch 2.1.2+cu121), onnx 1.16.1, and onnxruntime-gpu 1.18.1
When I try to run the necked down code (see below), I get an error about not being able to load CUDA
024-06-30 15:21:47.1727155 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 onnxruntime::TryGetProviderInfo_CUDA] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1426 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "(snip)\venv\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"
To reproduce
Activate my virtualenvironment (which has the versions listed above) and then ran:
I get back the error above and also a list of DLLs etc. loaded; these include torch's cudnn (and zlib) [and not the nvidia cuda/cudnn files I installed; at least one stackoverflow post indicated with pytorch I'm using, shouldn't need those as they've helpfully baked it in; I've tried unsetting the CUDNN/CUDA environment variables and removing from $PATH and had same behavior.
Urgency
Very low
Platform
Windows
OS Version
11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.1 (or trying to; may be somehow still using CUDA 12.5)
The text was updated successfully, but these errors were encountered: