Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX crashes on GPU/CUDA in GoogleColab #19137

Closed
danilojsl opened this issue Jan 14, 2024 · 1 comment
Closed

ONNX crashes on GPU/CUDA in GoogleColab #19137

danilojsl opened this issue Jan 14, 2024 · 1 comment
Labels
api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider

Comments

@danilojsl
Copy link

Describe the issue

I have an issue while using spark-nlp with GPU in GoogleColab notebooks. It always raises the following error:

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcufft.so.10: cannot open shared object file: No such file or directory

Here's what I've tried so far:

  • Creating symbolic links for the missing libraries.
  • Downloading and installing cuDNN for CUDA 11.2.
  • Downloading and installing cuDNN for CUDA 10.2.
  • Installing CUDA 11.2 using Conda.

I also attempted combinations of these methods, such as installing cuDNN 11.2 alongside creating symbolic links, but the error persists. Additionally, I tried installing CUDA 10, as the error consistently references libcufft.so.10. Unfortunately, this specific library version seems unavailable.

To reproduce

import sparknlp

from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
import pandas as pd
import os

spark = sparknlp.start(gpu=True)

print("Spark NLP version", sparknlp.version())
print("Apache Spark version:", spark.version)

spark

embeddings = MPNetEmbeddings.pretrained() \
    .setInputCols(["document"]) \
    .setOutputCol("embeddings")

Check this Google Notebook for more details.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.3 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.3

ONNX Runtime API

Java

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.2

@github-actions github-actions bot added api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider labels Jan 14, 2024
@snnn
Copy link
Member

snnn commented Jan 14, 2024

libcufft.so.10 is not from CUDA 10.x. It is from CUDA 11.x.
CUDA 11.x comes with libcufft 10.x, while CUDA 12.x comes with libcufft 11.x. See the link blow.
https://docs.nvidia.com/cuda/archive/12.0.0/cuda-toolkit-release-notes/index.html

The latest ONNX Runtime release, 1.16.3, was built with CUDA 11.x. If you need to use CUDA 12.x, you need to build it from source or wait for the next release(1.17.0), or use a nightly package from https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

2 participants