Add CUDA12 support for Java's onnxruntime_gpu dependency #19960

davidecaroselli · 2024-03-17T15:35:20Z

Describe the issue

When trying to use Java's onnxruntime_gpu:1.17.1 runtime on a CUDA 12 system, the program fails to load libonnxruntime_providers_cuda.so library because it searches for CUDA 11.x dependencies.

However, this issue seems to be already solved with (nearly) all runtimes except Java AFAIK: Install ONNX Runtime.

Can this be ported to Maven Central build too, please?

To reproduce

On a system with CUDA 12.3 installed:

$ nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
...

And a Java Maven project using the latest available version of onnxruntime_gpu:

<dependency>
    <groupId>com.microsoft.onnxruntime</groupId>
    <artifactId>onnxruntime_gpu</artifactId>
    <version>1.17.1</version>
</dependency>

You can reproduce the problem simply by running this Java main:

package org.example;

import ai.onnxruntime.OrtException;
import ai.onnxruntime.OrtSession;

public class App {

    public static void main(String[] args) throws OrtException {
        new OrtSession.SessionOptions().addCUDA(0);
    }

}

Resulting in thr following error:

Exception in thread "main" ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

	at ai.onnxruntime.OrtSession$SessionOptions.addCUDA(Native Method)
	at ai.onnxruntime.OrtSession$SessionOptions.addCUDA(OrtSession.java:1009)
	at org.example.App.main(App.java:9)

Urgency

Currently development of internal library is blocked because this issue makes impossible to run any Java-ONNX project on our new deployment with newest NVIDIA GPUs (i.e. GH200) as they require the latest drivers and CUDA library.

Platform

Linux

OS Version

Ubuntu 20.04.6 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.1

ONNX Runtime API

Java

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.3

The text was updated successfully, but these errors were encountered:

Craigacp · 2024-03-18T01:20:28Z

You can compile it from source with CUDA 12 support.

davidecaroselli · 2024-03-18T11:16:06Z

Hi @Craigacp and thanks for the advice.

I was able to compile the library from source using the attached Dockerfile, however there is an important caveat: It seems to me that ONNX runtime only supports cuDNN v8, while all latest NVIDIA CUDA images come with cuDNN v9.

If I try to compile FROM nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04, I get multiple errors like:

error: ‘cudnnSetRNNDescriptor_v6’ was not declared in this scope; did you mean ‘cudnnSetRNNDescriptor_v8’?
error: ‘cudnnSetRNNMatrixMathType’ was not declared in this scope; did you mean ‘cudnnSetConvolutionMathType’?
[...]

This is the Dockerfile I used:

FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04

RUN apt-get update && apt-get install -y --no-install-recommends python3-dev ca-certificates g++ python3-numpy gcc make git python3-setuptools python3-wheel python3-packaging python3-pip aria2 unzip wget openjdk-17-jdk && \
    aria2c -q -d /tmp -o cmake-3.27.3-linux-x86_64.tar.gz https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3-linux-x86_64.tar.gz && \
    tar -zxf /tmp/cmake-3.27.3-linux-x86_64.tar.gz --strip=1 -C /usr && rm /tmp/cmake-3.27.3-linux-x86_64.tar.gz && \
    wget -c https://services.gradle.org/distributions/gradle-8.6-bin.zip -P /tmp && unzip /tmp/gradle-8.6-bin.zip -d /opt/ && rm /tmp/gradle-8.6-bin.zip

ENV GRADLE_HOME=/opt/gradle-8.6
ENV PATH=${GRADLE_HOME}/bin:${PATH}

COPY onnxruntime /onnxruntime

RUN git config --global --add safe.directory /onnxruntime && cd /onnxruntime && git checkout -- . && git clean -fd . && \
    git checkout v1.17.1 && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && \
    ./build.sh --allow_running_as_root --skip_submodule_sync --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ \
               --use_cuda --config Release --build_shared_lib --build_java --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) 'CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;86'

So my follow-up questions are:

Are there any plans to make this build available in the official Maven Central repository ?
Are there any plans to support cuDNN 9? And/or is there any option to build ONNX runtime without cuDNN dependency?

Craigacp · 2024-03-18T13:17:08Z

cuDNN 9 came out after ORT 1.17 (#19419), so it probably won't be supported until at least the next feature release.

We're discussing what to do about CUDA 12 binaries for Java, whether to drop CUDA 11 completely or make two releases. It's not been decided yet.

davidecaroselli · 2024-03-18T13:30:39Z

Got it, thanks! I think cuDNN 9 would not be a huge problem for now as I can manually install cuDNN 8 in the docker file.

My two cents: a solution could be to create two different artifacts, like 1.17.1-cu11 and 1.17.1-cu12, you can always drop the first one as soon as you don't feel supporting it anymore.

One last problem I'm facing right now: I have just realized that the build I made on Ubuntu 22.04, won't work on Ubuntu 20.04 because of different libc.6.so version:

Caused by: java.lang.UnsatisfiedLinkError: /tmp/onnxruntime-java1823669597081387394/libonnxruntime.so: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /tmp/onnxruntime-java1823669597081387394/libonnxruntime.so)

As on my 20.04 machine I have /lib/x86_64-linux-gnu/libc-2.31.so. Just wondering, how did you solve this problem on the Java release? As it appears to me that the same Maven JAR works well in both versions of Ubuntu.

Is there a specific flag I can use during compilation to avoid dynamic linking to a specific version of libc?

Craigacp · 2024-03-18T14:00:32Z

Not that I'm aware. I think the release is compiled on 20.04.

davidecaroselli · 2024-03-18T14:03:47Z

Thanks, I'll give it a try!

snnn · 2024-03-18T22:35:51Z

Is there a specific flag I can use during compilation to avoid dynamic linking to a specific version of libc?

No. If you still need to support Ubuntu 20.04, consider using RHEL/CentOS(or UBI8) with "Red Hat Developer Toolset" to compile to code.

davidecaroselli · 2024-03-19T11:26:01Z

Hi @snnn and thanks for the hint!

I did try to build onnxruntime starting from nvidia/cuda:12.1.1-cudnn8-devel-ubi8 image, however I didn't expect it to be sooo painful 😅.

After a couple of hours of trial-and-error, I was able to spot several changes to overcome many compilation problems:

Build protobuf from source and statically linking it with ONNX_USE_PROTOBUF_SHARED_LIBS=OFF.
Enforce C++17 standard with CMAKE_CXX_STANDARD=17 and CMAKE_CXX_STANDARD_REQUIRED=ON.
Create a manual symbolic link ln -s /usr/lib64 /usr/lib/x86_64-linux-gnu as some dependency has /usr/lib/x86_64-linux-gnu hardcoded in their CMake file.
Skip unit tests build with onnxruntime_BUILD_UNIT_TESTS=OFF as many of them were failing to compile.

Despite all these precautions, I'm still not able to compile onnxruntime because of this error:

...
[ 61%] Linking CXX shared library libonnxruntime.so
[ 97%] Built target onnxruntime_providers_cuda
> Task :clean
> Task :spotlessInternalRegisterDependencies
libonnxruntime_providers.a(matmul_fpq4.cc.o): In function `onnxruntime::contrib::MatMulFpQ4::Compute(onnxruntime::OpKernelContext*) const':
matmul_fpq4.cc:(.text._ZNK11onnxruntime7contrib10MatMulFpQ47ComputeEPNS_15OpKernelContextE+0x4e2): undefined reference to `MlasQ4GemmPackBSize(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long)'
matmul_fpq4.cc:(.text._ZNK11onnxruntime7contrib10MatMulFpQ47ComputeEPNS_15OpKernelContextE+0x773): undefined reference to `MlasQ4GemmBatch(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long, unsigned long, unsigned long, MLAS_Q4_GEMM_DATA_PARAMS const*, onnxruntime::concurrency::ThreadPool*)'
libonnxruntime_providers.a(matmul_nbits.cc.o): In function `onnxruntime::contrib::MatMulNBits::Compute(onnxruntime::OpKernelContext*) const':
matmul_nbits.cc:(.text._ZNK11onnxruntime7contrib11MatMulNBits7ComputeEPNS_15OpKernelContextE+0x1264): undefined reference to `void MlasDequantizeBlockwise<float, 4>(float*, unsigned char const*, float const*, unsigned char const*, int, bool, int, int, onnxruntime::concurrency::ThreadPool*)'
libonnxruntime_graph.a(contrib_defs.cc.o): In function `onnxruntime::contrib::matmulQ4ShapeInference(onnx::InferenceContext&, int, int, int, MLAS_BLK_QUANT_TYPE) [clone .constprop.883]':
contrib_defs.cc:(.text._ZN11onnxruntime7contribL22matmulQ4ShapeInferenceERN4onnx16InferenceContextEiii19MLAS_BLK_QUANT_TYPE.constprop.883+0x2e8): undefined reference to `MlasQ4GemmPackBSize(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long)'
libonnxruntime_mlas.a(platform.cpp.o): In function `MLAS_PLATFORM::MLAS_PLATFORM()':
platform.cpp:(.text._ZN13MLAS_PLATFORMC2Ev+0x574): undefined reference to `MlasFpQ4GemmDispatchAvx512'
platform.cpp:(.text._ZN13MLAS_PLATFORMC2Ev+0x5b1): undefined reference to `MlasQ8Q4GemmDispatchAvx512vnni'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/onnxruntime.dir/build.make:172: libonnxruntime.so.1.17.1] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2113: CMakeFiles/onnxruntime.dir/all] Error 2
...

...and at this point I'm out of ideas on why it's failing...

Here's the Dockerfile I created so far:

FROM nvidia/cuda:12.1.1-cudnn8-devel-ubi8

ENV DEBIAN_FRONTEND=noninteractive

COPY onnxruntime /onnxruntime

RUN yum install -y zlib-devel python39-devel python39-numpy python39-setuptools python39-wheel python39-pip git unzip wget java-1.8.0-devel && \
    wget https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3-linux-x86_64.tar.gz && \
    tar -zxf cmake-3.27.3-linux-x86_64.tar.gz --strip=1 -C /usr && rm -f cmake-3.27.3-linux-x86_64.tar.gz && \
    wget https://services.gradle.org/distributions/gradle-8.6-bin.zip && unzip gradle-8.6-bin.zip -d /opt/ && rm -f gradle-8.6-bin.zip

RUN git clone https://github.com/protocolbuffers/protobuf.git && cd protobuf && git checkout v21.12 && git submodule update --init --recursive && mkdir build_source && cd build_source && \
    cmake ../cmake  -DCMAKE_INSTALL_LIBDIR=lib64 -Dprotobuf_BUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_POSITION_INDEPENDENT_CODE=ON -Dprotobuf_BUILD_TESTS=OFF -DCMAKE_BUILD_TYPE=Release && \
    make -j$(nproc) && make install

ENV GRADLE_HOME=/opt/gradle-8.6
ENV PATH=${GRADLE_HOME}/bin:${PATH}

RUN git config --global --add safe.directory /onnxruntime && cd /onnxruntime && git checkout -- . && git clean -fd . && \
    git checkout v1.17.1 && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && \
    ln -s /usr/lib64 /usr/lib/x86_64-linux-gnu && ./build.sh --allow_running_as_root --skip_submodule_sync --compile_no_warning_as_error --skip_tests \
    --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/lib64/ --config Release --build_java --update --build --parallel --cmake_extra_defines \
    ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) CMAKE_CUDA_ARCHITECTURES="52;60;61;70;75;86" CMAKE_CXX_STANDARD=17 CMAKE_CXX_STANDARD_REQUIRED=ON \
    ONNX_USE_PROTOBUF_SHARED_LIBS=OFF onnxruntime_BUILD_UNIT_TESTS=OFF

davidecaroselli · 2024-03-19T16:16:10Z

Update: I was (finally) able to build onnxruntime on *-ubi8 image by:

Removing onnxruntime_mlas_q4dq target (it failed for pthread problems) by changing this line with a simple if (FALSE):

onnxruntime/cmake/onnxruntime_mlas.cmake

Line 658 in 4c6a6a3

if (NOT onnxruntime_ORT_MINIMAL_BUILD)
Build script was not able to find JNI headers even if JAVA_HOME was properly set, so I forced those files like this:

for f in $(find $JAVA_HOME -name "*.h"); do ln -s $f /usr/include/$(basename $f); done

This is the final Dockerfile used to build onnxruntime_gpu:1.17.1-cu12: Dockerfile.ubi8

Would you accept a PR for this? If yes, do you see a more proper way to skip onnxruntime_mlas_q4dq build?

tianleiwu · 2024-03-21T04:50:14Z

Would you accept a PR for this? If yes, do you see a more proper way to skip onnxruntime_mlas_q4dq build?

Feel free to contribute a PR. I think you can add a build flag like onnxruntime_BUILD_MLAS_Q4DQ (example). Then replace the line to if (onnxruntime_BUILD_MLAS_Q4DQ)

lanking520 · 2024-04-17T03:09:22Z

Hi, do we have any updates for CUDA 12 support for ONNXRuntime Java?

davidecaroselli · 2024-04-17T06:11:35Z

Hi @lanking520 ! Unfortunately my PR (#20011) is blocked waiting for someone to review it. Still you can build it directly from my fork: the code is tested and I currently have the build in production in my environment.

@snnn do you have any update on the PR? Is there anything I can do to facilitate its merge? Thank you!

jchen351 · 2024-05-22T20:43:27Z

It is enabled with competition of #20583, and will be release with along with Onnxruntime 1.18

github-actions bot added api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider labels Mar 17, 2024

davidecaroselli mentioned this issue Mar 21, 2024

Add CUDA v12 support for Java onnxruntime_gpu build #20011

Open

snnn assigned jchen351 Mar 21, 2024

jchen351 closed this as completed May 22, 2024

jchen351 added the release:1.18.0 label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA12 support for Java's onnxruntime_gpu dependency #19960

Add CUDA12 support for Java's onnxruntime_gpu dependency #19960

davidecaroselli commented Mar 17, 2024

Craigacp commented Mar 18, 2024

davidecaroselli commented Mar 18, 2024

Craigacp commented Mar 18, 2024 •

edited

Loading

davidecaroselli commented Mar 18, 2024

Craigacp commented Mar 18, 2024

davidecaroselli commented Mar 18, 2024

snnn commented Mar 18, 2024

davidecaroselli commented Mar 19, 2024

davidecaroselli commented Mar 19, 2024

tianleiwu commented Mar 21, 2024 •

edited

Loading

lanking520 commented Apr 17, 2024

davidecaroselli commented Apr 17, 2024

jchen351 commented May 22, 2024

Add CUDA12 support for Java's onnxruntime_gpu dependency #19960

Add CUDA12 support for Java's onnxruntime_gpu dependency #19960

Comments

davidecaroselli commented Mar 17, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Craigacp commented Mar 18, 2024

davidecaroselli commented Mar 18, 2024

Craigacp commented Mar 18, 2024 • edited Loading

davidecaroselli commented Mar 18, 2024

Craigacp commented Mar 18, 2024

davidecaroselli commented Mar 18, 2024

snnn commented Mar 18, 2024

davidecaroselli commented Mar 19, 2024

davidecaroselli commented Mar 19, 2024

tianleiwu commented Mar 21, 2024 • edited Loading

lanking520 commented Apr 17, 2024

davidecaroselli commented Apr 17, 2024

jchen351 commented May 22, 2024

Craigacp commented Mar 18, 2024 •

edited

Loading

tianleiwu commented Mar 21, 2024 •

edited

Loading