[Performance] TensorRT EP produce different inference results compared to CUDA/CPU #22354
Labels
ep:CUDA
issues related to the CUDA execution provider
ep:TensorRT
issues related to TensorRT execution provider
performance
issues related to performance regressions
Describe the issue
Inference results with YOLOv8 in C++ differ between TensorRT EP and CUDA EP. I'm unable to share all of the code because most of it is proprietary and it would require a lot of refactoring to make it viewable.
We use YOLOv8 as an object detector and results are satisfactory using CUDA EP, but we would like to speed up inference results using TensorRT.
#21457 may be related but their issues lie with a newer version of TensorRT, not 8.6.1.6
I've noticed some slightly different results from preprocesing steps by doing preprocessing on CPU vs GPU but nothing that should significantly affect the inference results. The output from YOLOv8 using TensorRT produces bounding boxes that are believable, but the confidence scores are terrible.
To reproduce
Code snippet to initialize environment.
Snippet to run.
Urgency
No response
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.2, CUDNN 8.9.2, TensorRT 8.6.1.6
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: