You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using NVIDIA Triton to perform inference on various detection models using the onnxruntime and this has always worked fine, but once I upgraded from version 1.13.1 of the onnxruntime to 1.16.0, I started occasionally getting errors like this:
2023-12-07 10:14:43.214806225 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_5/model_4/model_2/res2b_branch2c/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 13622061778317179392
2023-12-07 10:14:50.385533092 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_7/model_6/res3d_branch2a/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 4375604526317872384
2023-12-07 10:14:51.047598146 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_73/model_72/res2a_branch1/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 13671442273597750016
The request buffer sizes are obviously completely wrong, so this seems like a bug in the onnxruntime or Triton, but I don't know enough about their interactions to be able to tell where exactly this is coming from.
I should note that it is not reliably reproducible. It doesn't happen for every inference request and restarting the inference server is sometimes enough to fix the problem, without changing anything about the models or version of Triton/onnxruntime.
To reproduce
I've only been able to reproduce this with 2 specific models so far, both CenterNet detection models.
Urgency
Medium. We upgraded to a new version of Triton/onnxruntime to fix minor issues with some other models, so we'd prefer to not have to downgrade.
Platform
Linux
OS Version
Ubuntu 22.04.3 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.2.2
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
I'm using NVIDIA Triton to perform inference on various detection models using the onnxruntime and this has always worked fine, but once I upgraded from version 1.13.1 of the onnxruntime to 1.16.0, I started occasionally getting errors like this:
The request buffer sizes are obviously completely wrong, so this seems like a bug in the onnxruntime or Triton, but I don't know enough about their interactions to be able to tell where exactly this is coming from.
I should note that it is not reliably reproducible. It doesn't happen for every inference request and restarting the inference server is sometimes enough to fix the problem, without changing anything about the models or version of Triton/onnxruntime.
To reproduce
I've only been able to reproduce this with 2 specific models so far, both CenterNet detection models.
Urgency
Medium. We upgraded to a new version of Triton/onnxruntime to fix minor issues with some other models, so we'd prefer to not have to downgrade.
Platform
Linux
OS Version
Ubuntu 22.04.3 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.2.2
The text was updated successfully, but these errors were encountered: