Failed to create CUDAExecutionProvider. #17537

jFkd1 · 2023-09-13T22:26:30Z

Describe the issue

For some reason, onnxruntime-gpu is having a hard time using CUDAExecutionProvider.

Using Cuda 11.7 with onnxruntime-gpu=1.15.1, I tried running the following in python 3.10:

import torch
import onnxruntime
save_path = 'model.onnx'
ort_session = onnxruntime.InferenceSession(save_path, providers=['CUDAExecutionProvider'])

And I will always get the error message:

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:640 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

It looks like I am using the right system and versions. I will upload the model onnx if necessary.

To reproduce

pip install onnxruntime-gpu==1.15.1

Using cuda 11.7

Run this code:

import torch
import onnxruntime
save_path = 'model.onnx'
ort_session = onnxruntime.InferenceSession(save_path, providers=['CUDAExecutionProvider'])

Urgency

Very urgent

Platform

Linux

OS Version

Ubuntu 18.04.6 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.7

The text was updated successfully, but these errors were encountered:

jFkd1 · 2023-09-13T22:27:05Z

Model download path: https://github.com/facefusion/facefusion-assets/releases/download/models/inswapper_128.onnx

tianleiwu · 2023-09-14T00:07:16Z

It is not related to model. Most likely related to package installation or cuDNN installation. Are you able to reproduce it in a new python environment, or reinstall like the following?

pip3 uninstall onnxruntime onnxruntime-gpu ort-nightly ort-nightly-gpu
pip3 install onnxruntime-gpu
pip3 install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu117

Then test with your python script

import torch
import onnxruntime
save_path = 'model.onnx'
ort_session = onnxruntime.InferenceSession(save_path, providers=['CUDAExecutionProvider'])

Since you imported torch before onnxruntime, ORT shall be able to use the cuDNN loaded by torch.

snnn · 2023-09-14T00:15:04Z

You can also try our nightly build.

To get it, first uninstall the one you have installed:

         python3 -m pip uninstall -y ort-nightly-gpu ort-nightly onnxruntime onnxruntime-gpu -qq

Then get a new one from our nightly feed:

python3 -m pip install coloredlogs flatbuffers numpy packaging protobuf sympy
python3 -m pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ ort-nightly-gpu

Then you can use the ldd tool to figure out which library was missing.

jFkd1 · 2023-09-14T01:44:42Z

hi @tianleiwu , thanks for commenting. I tried reinstalling it but it does not seem to help. I have also tried this on a different server (with the exact same cuda/python environment) but am still getting the same error.

@snnn thanks for the suggestion. Looks like the nightly build also is not working properly... The only difference I see is just that the error output is now colored yellow.

Can you specify on what you mean by using ldd?

snnn · 2023-09-14T02:01:53Z

To where the package is installed, you may find some *.so files. For example, if you run

find  . -name \*.so -exec ldd {} \;

in the directory where ort-nightly-gpu python package is installed, the ldd tool should show some errors saying something is not found.

tianleiwu · 2023-09-14T17:21:24Z

@jFkd1, Example of ldd for ort-nightly-gpu in Ubuntu 20.04 with CUDA 11.8:

/workspace/anaconda3/envs/sdxl/lib/python3.10/site-packages/onnxruntime/capi$ find  . -name \*.so -exec ldd {} \;
        linux-vdso.so.1 (0x00007ffc8c363000)
        libcublasLt.so.11 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007fb91f713000)
        libcublas.so.11 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11 (0x00007fb919ab5000)
        libcudnn.so.8 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8 (0x00007fb91988f000)
        libcurand.so.10 => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 (0x00007fb9133f9000)
        libcufft.so.10 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcufft.so.10 (0x00007fb90251e000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb9024fb000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb9024ef000)
        libcudart.so.11.0 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fb902248000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb901fda000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb901e8b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb901e66000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb901c74000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb95de91000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb901c6c000)
        linux-vdso.so.1 (0x00007ffca0965000)
        libcudnn.so.8 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8 (0x00007f3703763000)
        libcublas.so.11 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11 (0x00007f36fdb05000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f36fdaff000)
        libnvinfer-e0e3f4ae.so.8.6.1 => /workspace/anaconda3/envs/sdxl/lib/python3.10/site-packages/onnxruntime/capi/./../../ort_nightly_gpu.libs/libnvinfer-e0e3f4ae.so.8.6.1 (0x00007f36ef250000)
        libnvinfer_plugin-e94f6113.so.8.6.1 => /workspace/anaconda3/envs/sdxl/lib/python3.10/site-packages/onnxruntime/capi/./../../ort_nightly_gpu.libs/libnvinfer_plugin-e94f6113.so.8.6.1 (0x00007f36ecd64000)
        libnvonnxparser-216e92d7.so.8.6.1 => /workspace/anaconda3/envs/sdxl/lib/python3.10/site-packages/onnxruntime/capi/./../../ort_nightly_gpu.libs/libnvonnxparser-216e92d7.so.8.6.1 (0x00007f36ec8a9000)
        libcudart.so.11.0 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007f36ec602000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f36ec5e6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f36ec5c3000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f36ec5b9000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f36ec34b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f36ec1fa000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f36ec1d5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f36ebfe3000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3703c58000)
        libcublasLt.so.11 => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007f36c7a5d000)
        libcublas-6571bf83.so.12.0.2.224 => /workspace/anaconda3/envs/sdxl/lib/python3.10/site-packages/onnxruntime/capi/./../../ort_nightly_gpu.libs/libcublas-6571bf83.so.12.0.2.224 (0x00007f36c11d0000)
        libcublasLt-5bae62ee.so.12.0.2.224 => /workspace/anaconda3/envs/sdxl/lib/python3.10/site-packages/onnxruntime/capi/./../../ort_nightly_gpu.libs/libcublasLt-5bae62ee.so.12.0.2.224 (0x00007f36a03b8000)
        linux-vdso.so.1 (0x00007ffebf1d3000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6a9ee4f000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6a9ed00000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6a9ecdb000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6a9eae9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6a9f2d8000)
        linux-vdso.so.1 (0x00007ffc83315000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5a5726d000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5a57263000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5a57247000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5a57224000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5a56fb6000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5a56e67000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5a56e40000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5a56c4e000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5a5825f000)

snnn · 2023-09-14T17:31:07Z

It seems totally fine. As tianleiwu said, the "import torch" statement could be the reason. If you load pytorch before loading onnxruntime, onnxruntime probably will get cuda and cudnn libs from pytorch instead and there might be version mismatch happened. Can you try to create a simpler script that only has onnxruntime?

snnn · 2023-09-14T19:39:14Z

The latest nightly package has some problems. But the latest release package is fine. I just tested it under RHEL8 and it worked fine. The running environment had the following packages:

cuda-toolkit-11-config-common-11.8.89-1.noarch
cuda-cudart-11-8-11.8.89-1.x86_64
cuda-nvrtc-11-8-11.8.89-1.x86_64
libnccl-2.15.5-1+cuda11.8.x86_64
libcudnn8-8.9.0.131-1.cuda11.8.x86_64
cuda-toolkit-config-common-12.1.105-1.noarch
cuda-toolkit-11-8-config-common-11.8.89-1.noarch
cuda-compat-11-8-520.61.05-1.x86_64
cuda-libraries-11-8-11.8.0-1.x86_64
cuda-nvtx-11-8-11.8.86-1.x86_64

jFkd1 · 2023-09-14T20:28:03Z

Thanks for the responses.

I tried not importing torch and that did not seem to mitigate the issue. I was only importing torch because it was believed to solve the CUDA dependency issue with ort for some. Unfortunately it did not work for me.

Here's what I get from running find . -name \*.so -exec ldd {} \ in my ort-nightly install:

/qitian/miniconda3/envs/roop/lib/python3.10/site-packages/onnxruntime/capi$ find  . -name \*.so -exec ldd {} \;
        linux-vdso.so.1 (0x00007ffff7ffb000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffff6bf4000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ffff69ec000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff67cf000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff65b0000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ffff6227000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffff5e89000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ffff5c71000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff5880000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd3000)
        linux-vdso.so.1 (0x00007ffff7ffb000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ffff7848000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffff74aa000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ffff7292000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff6ea1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd3000)
Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!
Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!

A quick internet search suggests this may be related to #9754. I will try the suggested fix with reinstalling libcudnn8 and see if that would help.

snnn · 2023-09-14T21:15:03Z

I will make a new nightly package for you to test, which will not have the #9754 issue. But I will need a few days.

mahesh11T · 2023-09-21T11:16:19Z

following,
FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcurand.so.10: cannot open shared object file: No such file or directory
#/usr/local/lib/python3.8/site-packages/onnxruntime/capi/libonnxruntime_providers_cuda.so:
issue solved?

snnn · 2023-09-21T14:35:10Z

Right, as it said, your operating system cannot find libcurand.so.10. Did you install CUDA and CUDNN?

snnn · 2023-09-21T16:01:28Z

Feel free to create a new issue if the problem still exists. Please note CUDA and CUDNN are commercial software owned by Nvidia with End User License Agreements. Our team do not redistribute their software due to license restrictions and security concerns. Even if we do, there is no way to prove to you the redistributed files are genuine. The latest ONNX Runtime release was built with CUDA 11.8 and CUDNN 8.9. Users of ONNX Runtime GPU packages need to get the dependent libraries from Nvidia. If any dependent library is missing, on Linux ONNX Runtime should be able to print out a detailed error message when loading its CUDA execution provider. The message can tell what was missing.
A side note: if you need to have multiple CUDA versions in parallel, it's better to not add any of them to ld.so's global database(/etc/ld.so.cache) and you need to add one of them to LD_LIBRARY_PATH environment variable.

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Sep 13, 2023

snnn closed this as completed Sep 21, 2023

snnn mentioned this issue Sep 21, 2023

Can I use onnxruntime-gpu with pypi nvidia-tensorrt package? #13501

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to create CUDAExecutionProvider. #17537

Failed to create CUDAExecutionProvider. #17537

jFkd1 commented Sep 13, 2023

jFkd1 commented Sep 13, 2023

tianleiwu commented Sep 14, 2023

snnn commented Sep 14, 2023 •

edited

Loading

jFkd1 commented Sep 14, 2023 •

edited

Loading

snnn commented Sep 14, 2023 •

edited

Loading

tianleiwu commented Sep 14, 2023

snnn commented Sep 14, 2023

snnn commented Sep 14, 2023 •

edited

Loading

jFkd1 commented Sep 14, 2023

snnn commented Sep 14, 2023

mahesh11T commented Sep 21, 2023

snnn commented Sep 21, 2023

snnn commented Sep 21, 2023

Failed to create CUDAExecutionProvider. #17537

Failed to create CUDAExecutionProvider. #17537

Comments

jFkd1 commented Sep 13, 2023

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

jFkd1 commented Sep 13, 2023

tianleiwu commented Sep 14, 2023

snnn commented Sep 14, 2023 • edited Loading

jFkd1 commented Sep 14, 2023 • edited Loading

snnn commented Sep 14, 2023 • edited Loading

tianleiwu commented Sep 14, 2023

snnn commented Sep 14, 2023

snnn commented Sep 14, 2023 • edited Loading

jFkd1 commented Sep 14, 2023

snnn commented Sep 14, 2023

mahesh11T commented Sep 21, 2023

snnn commented Sep 21, 2023

snnn commented Sep 21, 2023

snnn commented Sep 14, 2023 •

edited

Loading

jFkd1 commented Sep 14, 2023 •

edited

Loading

snnn commented Sep 14, 2023 •

edited

Loading

snnn commented Sep 14, 2023 •

edited

Loading