-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference speed problem even if using a high-end Hardware. #19865
Comments
Can you share the converted model? Also anyone that uses C# API would benefit from reading this |
actually, we ran this converted model on rtx-4090 and our inference time was 35 ms but when we are trying to run it on rtx-A5000, we are getting somewhere around 90 ms. We are using the same version of CUDA (11.2) here for both the devices. We want to deploy this model on rtx-A5000 with the acceptable inference time of 35-40 ms. What are the possible reasons because of which we are encountering this time issue? Also, the onnx model has nms embedded in it. We wrote a script to embed the nms inside the onnx model, I have attatched the script file alongside here for reference. unfortunately, i cannot share the model, our company policy restricts the sharing of proprietary models. So, we would appreciate some suggestions and solutions on our queries, would be of great help. |
Describe the issue
We have trained ultralytics yolov8 model on 1024*1024, 3 channel images and converted to onnx and ran that onnx in visual studio 2022 c# .net v4.8 with onnxruntime-gpu v1.16.3 and it's taking around 90 ms on A5000 GPU.
We also tried different onnxruntime sessions options like : Graph Optimization Level, inter_op_num_threads, intra_op_num_threads, Execution mode (ORT_PARALLEL and ORT_SEQUENTIAL), Optimization Options (enable_mem_pattern).
But still there is no difference in the inference time.
So can anyone suggest if we are missing something or how we can reduce the time further even a bit?
To reproduce
Nothing to mention.
Urgency
Yes, it's urgent, Please do help.
Platform
Windows
OS Version
10
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.16.3
ONNX Runtime API
C#
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: