GPU Usage 100% #17942

lfch · 2023-10-13T14:15:27Z

Describe the issue

Issue Description

I wrapped a golang version of onnxruntime gpu for our model inference. The onnxruntime so library is loaded only once when service starts. We have a dedicated OrtSession object is created for a new model version and get it destroyed when a newer version comes. The older OrtSession object is destroyed safely with the guard of read-write lock.
When the service run for several hour with continuous model version update, we got the following situation.

GPU usage arises to 100% and keeps it all the time. refer to image 0
Two threads occupied 200% of two CPU core all the time. refer image 1 and their stacks refers to following code snippet

Expected Behavior

runs normally and do not occur the previous situation.

Versions

onnxruntime gpu == 1.15.1 download from github release page
cuda == 11.2
gpu device = A30

Files

image 0, gpu usage

image 1, cpu usage

thread stack

`Thread 15 (Thread 0x7f175d7fa700 (LWP 69607)):
#0 0x00007f15bb484bec in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007f15bb6abd62 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007f15bb6ac879 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007f15bb7e7450 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007f15bb439ce3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007f15bb43a1d1 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007f15bb43b138 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007f15bb60d251 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#8 0x00007f17140f04e9 in ?? () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudart.so.11.0
#9 0x00007f17140ca9ed in ?? () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudart.so.11.0
#10 0x00007f171410ee96 in cudaMemcpyAsync () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudart.so.11.0
#11 0x00007f158b6a381d in onnxruntime::GPUDataTransfer::CopyTensorAsync(onnxruntime::Tensor const&, onnxruntime::Tensor&, onnxruntime::Stream&) const () from /root/go/src/lib/libonnxruntime_providers_cuda.so
#12 0x00007f16937fa5a3 in ?? () from /root/go/src/lib//libonnxruntime.so
#13 0x00007f1693091f48 in ?? () from /root/go/src/lib//libonnxruntime.so
#14 0x00007f158b891497 in onnxruntime::IDataTransfer::CopyTensors(std::vector<onnxruntime::IDataTransfer::SrcDstPair, std::allocatoronnxruntime::IDataTransfer::SrcDstPair > const&) const () from /root/go/src/lib/libonnxruntime_providers_cuda.so
#15 0x00007f16937fb58c in ?? () from /root/go/src/lib//libonnxruntime.so
#16 0x00007f169389ddcb in ?? () from /root/go/src/lib//libonnxruntime.so
#17 0x00007f169389fa33 in ?? () from /root/go/src/lib//libonnxruntime.so
#18 0x00007f16938a01fc in ?? () from /root/go/src/lib//libonnxruntime.so
#19 0x00007f16930dff6c in ?? () from /root/go/src/lib//libonnxruntime.so
#20 0x00007f169306ef67 in ?? () from /root/go/src/lib//libonnxruntime.so
#21 0x0000000001673d56 in _cgo_de0f6483b9ae_Cfunc_RunOrtSession (v=0xc00cceb098) at cgo-gcc-prolog:431
#22 0x000000000047c624 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:821
#23 0x000000c031b244e0 in ?? ()
#24 0x0000000000000004 in ?? ()
#25 0x000000c00cceaff8 in ?? ()
#26 0x000000000047eb86 in time.now () at /usr/local/go/src/runtime/time_linux_amd64.s:52
#27 0x000000000a256179 in ?? ()
#28 0x00007f177f5b2f6f in ?? ()
#29 0x0000000000800000 in ?? () at :1
#30 0x0000000000000000 in ?? ()

Thread 68 (Thread 0x7f15517fe700 (LWP 312851)):
#0 0x00007f15bb70c823 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007f15bb469206 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007f15bb7ef2bf in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007f15bb7eff6f in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007f15bb484bf7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007f15bb518928 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007f15bb5b9063 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007f15d75c3717 in ?? () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcublas.so.11
#8 0x00007f15d75f3f15 in ?? () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcublas.so.11
#9 0x00007f15d6c87bfc in ?? () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcublas.so.11
#10 0x00007f15d6c89010 in ?? () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcublas.so.11
#11 0x00007f15d6c87539 in ?? () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcublas.so.11
#12 0x00007f15d6d4defa in cublasDestroy_v2 () from /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcublas.so.11
#13 0x00007f17381d6159 in onnxruntime::CudaStream::~CudaStream() () from /root/go/src/lib/libonnxruntime_providers_tensorrt.so
#14 0x00007f17381d624d in onnxruntime::CudaStream::~CudaStream() () from /root/go/src/lib/libonnxruntime_providers_tensorrt.so
#15 0x00007f169380d60a in ?? () from /root/go/src/lib//libonnxruntime.so
#16 0x00007f169380d811 in ?? () from /root/go/src/lib//libonnxruntime.so
#17 0x00007f16930eb959 in ?? () from /root/go/src/lib//libonnxruntime.so
#18 0x00007f16930ee804 in ?? () from /root/go/src/lib//libonnxruntime.so
#19 0x00007f16930eed7d in ?? () from /root/go/src/lib//libonnxruntime.so
#20 0x000000000047c624 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:821
#21 0x0000000000452aed in runtime.park_m (gp=0xc0001dc340) at /usr/local/go/src/runtime/proc.go:3336
#22 0x000000c0036aa4e0 in ?? ()
#23 0x000000c0001dc340 in ?? ()
#24 0x0000000000000000 in ?? ()
`

Questions

Is onnxruntime 1.15.1 compatible with cuda 11.2 ?
Is the Release OrtSession concurrent safe? In our case it is that an old Ort Session release runs at the same time a newer Ort Session object runs a session ?

To reproduce

No

Urgency

No

Platform

Linux

OS Version

ubuntu 20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

TensorRT 8.6.1， Cuda 11.2

Model File

No response

Is this a quantized model?

No

tianleiwu · 2023-10-17T03:22:27Z

@lfch, Could you upgrade CUDA to 11.8 to see whether the issue is still there? We have not tested CUDA 11.2, which is quite old.

If you need help, please share some test code/script and model/data that could reproduce the issue. Otherwise, other people cannot trouble shoot the issue.

github-actions · 2023-11-16T15:01:01Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

github-actions · 2024-01-11T15:01:02Z

This issue has been automatically closed due to inactivity. Please reactivate if further support is needed.

github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider labels Oct 13, 2023

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Nov 16, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Usage 100% #17942

GPU Usage 100% #17942

lfch commented Oct 13, 2023

tianleiwu commented Oct 17, 2023 •

edited

Loading

github-actions bot commented Nov 16, 2023

github-actions bot commented Jan 11, 2024

GPU Usage 100% #17942

GPU Usage 100% #17942

Comments

lfch commented Oct 13, 2023

Describe the issue

Issue Description

Expected Behavior

Versions

Files

image 0, gpu usage

image 1, cpu usage

thread stack

Questions

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

tianleiwu commented Oct 17, 2023 • edited Loading

github-actions bot commented Nov 16, 2023

github-actions bot commented Jan 11, 2024

tianleiwu commented Oct 17, 2023 •

edited

Loading