-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] RE: When using CUDA the first run is very slow -- cudnn_conv_algo_search #19838
Comments
To make this easy, a sentence could be added:
|
Thanks for the feedback - @hmaarrfk. It is good to document this. Please keep in mind though- that the slowness of the first Run() may not be limited to just this. The allocations to grow the underlying memory pool could also cause the first Run() to be slow(er) than subsequent runs. Usually, a good practise is to do a few warm-up Runs using the session instance using representative inputs before the "real" Runs. |
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
no stale. doc was not updated. |
@hmaarrfk I faced a similar issue. What does the "cudnn_conv_algo_search" flag actually do. Cause I can also notice it is slow for the first few (up to 100 runs) and then the "EXHAUSTIVE" is just a tiny bit faster. |
@spoorgholi74 it is trying various convolution algorithms to choose the fastest and it needs to run them all once to time them before choosing which to use overall. We found that of the three options (EXHAUSTIVE, DEFAULT, and HEURISTIC), HEURISTIC is the fastest and yields great results. |
This solved a problem I was having but still leaves me wondering shouldn't it be caching the results from the exhaustive search instead of performing it on every run? After reading #10746 I've tried setting |
Describe the issue
I didn't want to reply to #10746 since it was mentionned that the issue is a placeholder.
I wanted to say that in our work, we've found that issue to have omitted a critical piece of information regarding the effect of
cudnn_conv_algo_search
to the performance of the first run.The default value,
EXAUSTIVE
as mentioned in the C API and the Python documentationSeems to be a significant contributor to this effect.
It would be good if a small note were added in that placeholder issue to mention that users would have a choice in the session optimization strategy.
Thank you @davidmezzetti for bringing thing to my attention in your blog post
https://medium.com/neuml/debug-onnx-gpu-performance-c9290fe07459
cc: @jefromson
To reproduce
Start your onnx session with the following options:
and change between the different options for
cudnn_conv_algo_search
Urgency
just a small tip for others.
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.17.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.0
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: