Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Issues with CUDA 11.4 and ONNX Runtime 1.11.0 #19631

Open
HShamimGEHC opened this issue Feb 23, 2024 · 13 comments
Open

[Build] Issues with CUDA 11.4 and ONNX Runtime 1.11.0 #19631

HShamimGEHC opened this issue Feb 23, 2024 · 13 comments
Assignees
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider platform:jetson issues related to the NVIDIA Jetson platform

Comments

@HShamimGEHC
Copy link

HShamimGEHC commented Feb 23, 2024

Describe the issue

onnxruntime:Default, provider_bridge_ort.cc:1022 Get] Failed to load library libonnxruntime_providers_cuda.so with error: libcublas.so.10: cannot open shared object file: No such file or directory

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:552 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

Urgency

Urgent

Target platform

Docker on NVIDIA Jetson AGX Xavier

Build script

RUN wget https://nvidia.box.com/shared/static/2sv2fv1wseihaw8ym0d4srz41dzljwxh.whl -O onnxruntime_gpu-1.11.0-cp38-cp38-linux_aarch64.whl &&
pip3 install onnxruntime_gpu-1.11.0-cp38-cp38-linux_aarch64.whl

Install CUDA toolkit

RUN apt-get update && apt-get install -y cuda-toolkit-11-4 && rm -rf /var/lib/apt/lists/*

I was provided a model.onnx that I am trying to load so that I can run inferencing. I was just provided this model.onnx. No

Error / output

onnxruntime:Default, provider_bridge_ort.cc:1022 Get] Failed to load library libonnxruntime_providers_cuda.so with error: libcublas.so.10: cannot open shared object file: No such file or directory

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:552 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

Visual Studio Version

No response

GCC / Compiler Version

No response

Tasks

Preview Give feedback
No tasks being tracked yet.
@HShamimGEHC HShamimGEHC added the build build issues; typically submitted using template label Feb 23, 2024
@tianleiwu tianleiwu added the platform:jetson issues related to the NVIDIA Jetson platform label Feb 23, 2024
@tianleiwu
Copy link
Contributor

tianleiwu commented Feb 23, 2024

From the error message, the wheel was built with CUDA 10.

Please follow the following to install proper version of Jetpack (that will install matched version of CUDA):
https://elinux.org/Jetson_Zoo#ONNX_Runtime
For example, 1.11 matches with JetPack 4.4 / 4.4.1 / 4.5 / 4.5.1 / 4.6 / 4.6.1.

For build, please take a look at document: https://onnxruntime.ai/docs/build/eps.html#nvidia-jetson-tx1tx2nanoxavier

@HShamimGEHC
Copy link
Author

HShamimGEHC commented Feb 23, 2024

Because this is on a docker, is the following acceptable:

Without changing the JetPack version, download CUDA 10.0 with a Dockerfile?

My current JetPack version is: JetPack 5.1.2 and l4T version is 35.4.1, but I am trying to do all this on a Docker Container

@tianleiwu
Copy link
Contributor

tianleiwu commented Feb 23, 2024

The doc mentioned that CUDA version 11.8 with JetPack 5.1.2 has been tested on Jetson when building ONNX Runtime 1.16.

I guess the docker container r35.4.1 has CUDA 11.4. In that case, you can try onnxruntime-gpu 1.16 or 1.17.

@HShamimGEHC
Copy link
Author

I see. I decided to use 1.16 and was wondering if my Dockerfile starts out with:

FROM nvcr.io/nvidia/l4t-base:35.4.1 and it should contain CUDA11.4,

why did I still have to:
RUN apt-get update && apt-get install -y cuda-toolkit-11-4 && rm -rf /var/lib/apt/lists/*

to bypass these first set of errors regarding libcublas...?

@HShamimGEHC
Copy link
Author

I should also have cuDNN 8.6.0 but my next set of error is this:

2024-02-23 22:55:09.263697843 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /home/ort/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

@jywu-msft
Copy link
Member

@yf711 can you advise?

@davidlee8086
Copy link

davidlee8086 commented Feb 24, 2024

I used to follow this post to deploy docker container for jetson. Please let me know if you could deploy env that can fit your cuda/cudnn requirement. Thanks!

@yf711
Copy link
Contributor

yf711 commented Feb 24, 2024

Hi @HShamimGEHC, https://github.com/dusty-nv/jetson-containers there's a wide variety of containers designed for jetson, feel free to pick one which works on your case.

@HShamimGEHC
Copy link
Author

Hi @HShamimGEHC, https://github.com/dusty-nv/jetson-containers there's a wide variety of containers designed for jetson, feel free to pick one which works on your case.

Hi @yf711, sure I can give that a try. Since I need onnxrt and cuda and cudnn, how can I, after downloading them, use them the docker file that I am trying to create? If you could provide some insight onto that, I would greatly appreciate it.

@HShamimGEHC
Copy link
Author

I used to follow this post to deploy docker container for jetson. Please let me know if you could deploy env that can fit your cuda/cudnn requirement. Thanks!

Hi @davidlee8086 - I was trying to follow this but if I plan to install each of the containers I need separately, I find myself running out of storage on my Jetson AGX Xavier.

@jywu-msft
Copy link
Member

I should also have cuDNN 8.6.0 but my next set of error is this:

2024-02-23 22:55:09.263697843 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /home/ort/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

is libcudnn.so in your LD_LIBRARY_PATH?

@HShamimGEHC
Copy link
Author

HShamimGEHC commented Feb 26, 2024

I should also have cuDNN 8.6.0 but my next set of error is this:
2024-02-23 22:55:09.263697843 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /home/ort/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

is libcudnn.so in your LD_LIBRARY_PATH?

I checked and it wasn't there. I decided to scroll through the NVIDIA NGC Page and stumbled on this dockerfile: https://gitlab.com/nvidia/container-images/l4t-jetpack/-/blob/master/Dockerfile.jetpack?ref_type=heads

(It includes commands for downloading CUDA and cudnn). This solved my issues of not finding any CUDA or CUDNN related libraries.

The last follow up I have is in regard to onnxruntime. How should I know which onnxruntime to download from the Jetson Zoo link: https://elinux.org/Jetson_Zoo#ONNX_Runtime? Should I just use the one that corresponds to the version of Jetpack SDK that my Jetson Xavier is on?

@jywu-msft
Copy link
Member

I should also have cuDNN 8.6.0 but my next set of error is this:
2024-02-23 22:55:09.263697843 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /home/ort/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

is libcudnn.so in your LD_LIBRARY_PATH?

I checked and it wasn't there. I decided to scroll through the NVIDIA NGC Page and stumbled on this dockerfile: https://gitlab.com/nvidia/container-images/l4t-jetpack/-/blob/master/Dockerfile.jetpack?ref_type=heads

(It includes commands for downloading CUDA and cudnn). This solved my issues of not finding any CUDA or CUDNN related libraries.

The last follow up I have is in regard to onnxruntime. How should I know which onnxruntime to download from the Jetson Zoo link: https://elinux.org/Jetson_Zoo#ONNX_Runtime? Should I just use the one that corresponds to the version of Jetpack SDK that my Jetson Xavier is on?

yes, using the package corresponding to the JetPack version is the best option. otherwise, you will need to bring in the appropriate dependencies.

@sophies927 sophies927 added the ep:CUDA issues related to the CUDA execution provider label Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider platform:jetson issues related to the NVIDIA Jetson platform
Projects
None yet
Development

No branches or pull requests

6 participants