[Performance] Multiple Sessions on Same GPU is very slow #21365
Labels
ep:TensorRT
issues related to TensorRT execution provider
performance
issues related to performance regressions
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
Hello I am trying to deploy onnxruntime with TensorRT execution provider. When I deploy my session with GPU 0 everything works perfect (50 ms) but When I try to deploy same Yolo model with 2 sessions (subprocess arch) inference speed drastically slows down to 200 - 300 ms
To reproduce
Urgency
No response
Platform
Linux
OS Version
22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16
ONNX Runtime API
Python
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
TensorRT 8.6.1
Model File
No response
Is this a quantized model?
Unknown
The text was updated successfully, but these errors were encountered: