-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Session's inner variables not refreshed between 2 runs #18742
Comments
related error message: |
Yes but the error message happens only during the second Run, as if idx hadn't been reset to 0 at the end of the first inference. |
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Still not solved |
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
It's not solved, I'm running onnxruntime 1.18.0 dev with TRT backend, and this error is still here, unfortunately.
|
I traced down this error to happen exclusively with TensorrtExecutionProvider with trt_cuda_graph_enable on 1.18.0dev version. |
Describe the issue
Running an Ort Session in python two times leads to an error, which is always about an index out of range somewhere in the operations. It makes me think that it is caused by a variable in a "for loop" inside the graph that is not reset between the two runs.
I tried with several networks and I obtain this kind of error during the second run() :
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'/sa1/Gather_136' Status Message: indices element out of data bounds, idx=100 must be within the inclusive range [-100,99]
The input for inference can even be the same than the one used for tracing.
For some networks, an interesting thing that I mentioned is that exporting it using dynamic_axes removes this problem, as if beeing exported this way allow to empty a kind of 'cache' in the inner variables of the model.
To reproduce
Dockerfile
FROM pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
RUN pip install onnx
RUN pip install onnxruntime
Minimal code
Architecture
export / import code
"inner variables reset" with dynamic axes
Urgency
No response
Platform
Linux
OS Version
#1 SMP Thu Aug 31 10:29:22 EDT 2023
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.16.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: