Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation failures due to incorrect requested buffer size #18743

Open
OvervCW opened this issue Dec 7, 2023 · 4 comments
Open

Memory allocation failures due to incorrect requested buffer size #18743

OvervCW opened this issue Dec 7, 2023 · 4 comments
Labels
ep:CUDA issues related to the CUDA execution provider

Comments

@OvervCW
Copy link

OvervCW commented Dec 7, 2023

Describe the issue

I'm using NVIDIA Triton to perform inference on various detection models using the onnxruntime and this has always worked fine, but once I upgraded from version 1.13.1 of the onnxruntime to 1.16.0, I started occasionally getting errors like this:

2023-12-07 10:14:43.214806225 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_5/model_4/model_2/res2b_branch2c/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 13622061778317179392
2023-12-07 10:14:50.385533092 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_7/model_6/res3d_branch2a/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 4375604526317872384
2023-12-07 10:14:51.047598146 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_73/model_72/res2a_branch1/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 13671442273597750016

The request buffer sizes are obviously completely wrong, so this seems like a bug in the onnxruntime or Triton, but I don't know enough about their interactions to be able to tell where exactly this is coming from.

I should note that it is not reliably reproducible. It doesn't happen for every inference request and restarting the inference server is sometimes enough to fix the problem, without changing anything about the models or version of Triton/onnxruntime.

To reproduce

I've only been able to reproduce this with 2 specific models so far, both CenterNet detection models.

Urgency

Medium. We upgraded to a new version of Triton/onnxruntime to fix minor issues with some other models, so we'd prefer to not have to downgrade.

Platform

Linux

OS Version

Ubuntu 22.04.3 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.2.2

@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Dec 7, 2023
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 13, 2024
@OvervCW
Copy link
Author

OvervCW commented Jan 13, 2024

It is definitely still an issue we are dealing with.

@github-actions github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Jan 14, 2024
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Feb 13, 2024
@OvervCW
Copy link
Author

OvervCW commented Feb 13, 2024

Still an issue.

@github-actions github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

1 participant