ONNX Runtime 1.18.1 CUDA 12.4 cuDNN 9.2 breaks inference with repeated inputs when enable_mem_reuse is enabled #21349
Labels
api
issues related to all other APIs: C, C++, Python, etc.
ep:CUDA
issues related to the CUDA execution provider
model:transformer
issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
platform:windows
issues related to the Windows platform
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
When inferencing a model with the same input data in the same session two or more times (so enable_mem_reuse takes into action), the following error is raised:
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/Reshape_1' Status Message: D:\a\_work\1\s\onnxruntime\core/providers/cpu/tensor/reshape_helper.h:30 onnxruntime::ReshapeHelper::ReshapeHelper i < input_shape.NumDimensions() was false. The dimension with value zero exceeds the dimension size of the input tensor.
The error does not happen if enable_mem_reuse is disabled.
To reproduce
Download the following example onnx model (it happens with more models) from Piper repo: https://huggingface.co/rhasspy/piper-voices/resolve/main/ka/ka_GE/natia/medium/ka_GE-natia-medium.onnx
Then execute the following code to raise the error:
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU, CUDA
Execution Provider Library Version
CUDA 12.4, cuDNN 9.2
The text was updated successfully, but these errors were encountered: