Multithreaded GPU inference. #18209
Labels
ep:CUDA
issues related to the CUDA execution provider
platform:windows
issues related to the Windows platform
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
I have an image with a size of 16384*50000.
I divide the image into 4 and put it in std::vectorcv::Mat .
for each block I ran a [20002000] sliding window , resize it to 640640 Roi's and execute a object detection session (yolov5.onnx).
when the 4 block are ran in a serial mode the execution time is : 3535 [msec]
when the 4 block are ran in a parallel mode ( std::async) the execution time is : 2419 [msec]
Is there is a better method to launch 4 sessions in parallel ?
To reproduce
hardware used in the test : RTX 2070 - intel core i7-8700 (12 core)
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
Microsoft.ML.OnnxRuntime.Gpu 1.15.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.7
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: