Memory allocation failures due to incorrect requested buffer size #18743

OvervCW · 2023-12-07T10:49:09Z

Describe the issue

I'm using NVIDIA Triton to perform inference on various detection models using the onnxruntime and this has always worked fine, but once I upgraded from version 1.13.1 of the onnxruntime to 1.16.0, I started occasionally getting errors like this:

2023-12-07 10:14:43.214806225 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_5/model_4/model_2/res2b_branch2c/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 13622061778317179392
2023-12-07 10:14:50.385533092 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_7/model_6/res3d_branch2a/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 4375604526317872384
2023-12-07 10:14:51.047598146 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_73/model_72/res2a_branch1/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 13671442273597750016

The request buffer sizes are obviously completely wrong, so this seems like a bug in the onnxruntime or Triton, but I don't know enough about their interactions to be able to tell where exactly this is coming from.

I should note that it is not reliably reproducible. It doesn't happen for every inference request and restarting the inference server is sometimes enough to fix the problem, without changing anything about the models or version of Triton/onnxruntime.

To reproduce

I've only been able to reproduce this with 2 specific models so far, both CenterNet detection models.

Urgency

Medium. We upgraded to a new version of Triton/onnxruntime to fix minor issues with some other models, so we'd prefer to not have to downgrade.

Platform

Linux

OS Version

Ubuntu 22.04.3 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.2.2

github-actions · 2024-01-13T15:00:59Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

OvervCW · 2024-01-13T15:09:32Z

It is definitely still an issue we are dealing with.

github-actions · 2024-02-13T15:01:04Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

OvervCW · 2024-02-13T15:22:27Z

Still an issue.

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Dec 7, 2023

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 13, 2024

github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Jan 14, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Feb 13, 2024

github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory allocation failures due to incorrect requested buffer size #18743

Memory allocation failures due to incorrect requested buffer size #18743

OvervCW commented Dec 7, 2023 •

edited

Loading

github-actions bot commented Jan 13, 2024

OvervCW commented Jan 13, 2024

github-actions bot commented Feb 13, 2024

OvervCW commented Feb 13, 2024

Memory allocation failures due to incorrect requested buffer size #18743

Memory allocation failures due to incorrect requested buffer size #18743

Comments

OvervCW commented Dec 7, 2023 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

github-actions bot commented Jan 13, 2024

OvervCW commented Jan 13, 2024

github-actions bot commented Feb 13, 2024

OvervCW commented Feb 13, 2024

OvervCW commented Dec 7, 2023 •

edited

Loading