Inference speed problem even if using a high-end Hardware. #19865

Deckard-2049 · 2024-03-12T11:28:02Z

Describe the issue

We have trained ultralytics yolov8 model on 1024*1024, 3 channel images and converted to onnx and ran that onnx in visual studio 2022 c# .net v4.8 with onnxruntime-gpu v1.16.3 and it's taking around 90 ms on A5000 GPU.
We also tried different onnxruntime sessions options like : Graph Optimization Level, inter_op_num_threads, intra_op_num_threads, Execution mode (ORT_PARALLEL and ORT_SEQUENTIAL), Optimization Options (enable_mem_pattern).
But still there is no difference in the inference time.
So can anyone suggest if we are missing something or how we can reduce the time further even a bit?

To reproduce

Nothing to mention.

Urgency

Yes, it's urgent, Please do help.

Platform

Windows

OS Version

10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.3

ONNX Runtime API

C#

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

yuslepukhin · 2024-03-12T16:29:45Z

Can you share the converted model?

Also anyone that uses C# API would benefit from reading this

Deckard-2049 · 2024-03-13T06:57:41Z

actually, we ran this converted model on rtx-4090 and our inference time was 35 ms but when we are trying to run it on rtx-A5000, we are getting somewhere around 90 ms. We are using the same version of CUDA (11.2) here for both the devices. We want to deploy this model on rtx-A5000 with the acceptable inference time of 35-40 ms. What are the possible reasons because of which we are encountering this time issue?

Also, the onnx model has nms embedded in it. We wrote a script to embed the nms inside the onnx model, I have attatched the script file alongside here for reference.
adding_nms.py-20240313T133344Z-001.zip

unfortunately, i cannot share the model, our company policy restricts the sharing of proprietary models. So, we would appreciate some suggestions and solutions on our queries, would be of great help.

yuslepukhin · 2024-03-13T17:20:59Z

https://onnxruntime.ai/docs/performance/tune-performance/

github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform labels Mar 12, 2024

sophies927 removed the ep:CUDA issues related to the CUDA execution provider label Mar 14, 2024

sophies927 removed the platform:windows issues related to the Windows platform label Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference speed problem even if using a high-end Hardware. #19865

Inference speed problem even if using a high-end Hardware. #19865

Deckard-2049 commented Mar 12, 2024

yuslepukhin commented Mar 12, 2024

Deckard-2049 commented Mar 13, 2024 •

edited

Loading

yuslepukhin commented Mar 13, 2024

Inference speed problem even if using a high-end Hardware. #19865

Inference speed problem even if using a high-end Hardware. #19865

Comments

Deckard-2049 commented Mar 12, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

yuslepukhin commented Mar 12, 2024

Deckard-2049 commented Mar 13, 2024 • edited Loading

yuslepukhin commented Mar 13, 2024

Deckard-2049 commented Mar 13, 2024 •

edited

Loading