Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX Runtime 1.18.1 CUDA 12.4 cuDNN 9.2 breaks inference with repeated inputs when enable_mem_reuse is enabled #21349

Open
SystemPanic opened this issue Jul 14, 2024 · 1 comment
Labels
api issues related to all other APIs: C, C++, Python, etc. ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot

Comments

@SystemPanic
Copy link

Describe the issue

When inferencing a model with the same input data in the same session two or more times (so enable_mem_reuse takes into action), the following error is raised:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/Reshape_1' Status Message: D:\a\_work\1\s\onnxruntime\core/providers/cpu/tensor/reshape_helper.h:30 onnxruntime::ReshapeHelper::ReshapeHelper i < input_shape.NumDimensions() was false. The dimension with value zero exceeds the dimension size of the input tensor.

The error does not happen if enable_mem_reuse is disabled.

To reproduce

  1. Download the following example onnx model (it happens with more models) from Piper repo: https://huggingface.co/rhasspy/piper-voices/resolve/main/ka/ka_GE/natia/medium/ka_GE-natia-medium.onnx

  2. Then execute the following code to raise the error:

import onnxruntime
import numpy as np
session_options = onnxruntime.SessionOptions()
session_options.enable_mem_reuse = True
session = onnxruntime.InferenceSession(
    'ka_GE-natia-medium.onnx',
    sess_options=session_options,
    providers=["CUDAExecutionProvider"]
)
input = np.asarray([[1, 30, 0, 120, 0, 27, 0, 25, 0, 18, 0, 24, 0, 121, 0, 21, 0, 14, 0, 3, 0, 25, 0, 120, 0, 14, 0, 2]], dtype=np.int64)
input_lengths = np.asarray([28], dtype=np.int64)
scales = np.asarray([0.6669999957084656, 1.0, 0.800000011920929], dtype=np.float32)
sid = None
audio = session.run(
    None,
    {
        "input": input,
        "input_lengths": input_lengths,
        "scales": scales,
        "sid": sid,
    },
)[0]
print(audio)
audio = session.run(
    None,
    {
        "input": input,
        "input_lengths": input_lengths,
        "scales": scales,
        "sid": sid,
    },
)[0]
print(audio)
  1. Finally, just change session_options.enable_mem_reuse to False and execute the code again, it works.

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU, CUDA

Execution Provider Library Version

CUDA 12.4, cuDNN 9.2

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform labels Jul 14, 2024
@sophies927 sophies927 added the api issues related to all other APIs: C, C++, Python, etc. label Jul 18, 2024
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api issues related to all other APIs: C, C++, Python, etc. ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants