-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORTModelForCausalLM inference fails (after converting transformer to ONNX) #1678
Comments
Hi @ingo-m, thank you for the report. Locally, how did you install Not sure it will work, but you can also try Regarding the
I'm not sure yet, will investigate. |
@ingo-m I can not reproduce the issue with: import torch
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_name = "bigscience/bloomz-560m"
device_name = "cuda"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
ort_model = ORTModelForCausalLM.from_pretrained(
base_model_name,
use_io_binding=True,
export=True,
provider="CUDAExecutionProvider",
)
prompt = "i like pancakes"
inference_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(
device_name
)
# Try to generate a prediction (fails).
output_ids = ort_model.generate(
input_ids=inference_ids["input_ids"],
attention_mask=inference_ids["attention_mask"],
max_new_tokens=512,
temperature=1e-8,
do_sample=True,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True)) with CUDA 11.8, torch==2.1.2+cu118, optimum==1.16.2, onnxruntime-gpu==1.17.0, onnx==1.15.0. |
@fxmarty thanks for looking into it. Locally, I installed directly from PyPI (with pipenv). In other words, I did not follow the specific instructions for CUDA 12, so that explains the problem. (However, it's strange that I had no problems with CUDA 12 when I was still using the older version On google colab,
As you said, it looks like CUDA 12 is the culprit. |
Regarding this error:
Perhaps the |
System Info
The bug as described below occurs locally on my system with the following specs, and on google colab (see below for reproducible example):
Who can help?
@michaelbenayoun (error happens with a transformer model converted to ONNX)
@JingyaHuang (error seems to be related to ONNX runtime)
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
The bug is described below, here is a reproducible example:
https://colab.research.google.com/drive/1QZ4_vttj-r5D3fwff49KZ0gzqwB5BRuM?usp=sharing
Expected behavior
I am trying to convert a transformer model ("bigscience/bloomz-560m") to ONNX format, and then perform inference with the ONNX model.
I was previously able to do this, with the following library versions:
However, after upgrading to the latest versions, performing inference with the ONNX model fails. These are the version I upgraded to:
Now, when trying to perform inference, I get this error:
When running locally, I additionally get this message in the error traceback (I don't get this on colab):
The weird thing is that (when running locally) the respective virtual env does actually have
libcublasLt.so.11
(in my case at~/miniconda3/envs/py-onnx/lib/python3.10/site-packages/nvidia/cublas/lib
):So the cuda library cannot be found, although it is there? And why does it want to use
libcublasLt.so.11
(and notlibcublasLt.so.12
)? 🤔According to this issue, onnxruntime
1.17.0
does support CUDA 12. My CUDA version is 12.0 (which I didn't change).The text was updated successfully, but these errors were encountered: