Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] RE: When using CUDA the first run is very slow -- cudnn_conv_algo_search #19838

Open
hmaarrfk opened this issue Mar 9, 2024 · 7 comments
Labels
documentation improvements or additions to documentation; typically submitted using template

Comments

@hmaarrfk
Copy link

hmaarrfk commented Mar 9, 2024

Describe the issue

I didn't want to reply to #10746 since it was mentionned that the issue is a placeholder.

I wanted to say that in our work, we've found that issue to have omitted a critical piece of information regarding the effect of cudnn_conv_algo_search to the performance of the first run.

The default value, EXAUSTIVE as mentioned in the C API and the Python documentation

Seems to be a significant contributor to this effect.
It would be good if a small note were added in that placeholder issue to mention that users would have a choice in the session optimization strategy.

Thank you @davidmezzetti for bringing thing to my attention in your blog post
https://medium.com/neuml/debug-onnx-gpu-performance-c9290fe07459

cc: @jefromson

To reproduce

Start your onnx session with the following options:

and change between the different options for cudnn_conv_algo_search

    providers=[
        ("CUDAExecutionProvider", {
            # "cudnn_conv_algo_search": "DEFAULT",
            # "cudnn_conv_algo_search": "HEURISTIC",
            "cudnn_conv_algo_search": "EXHAUSTIVE",
        }),
        # "CPUExecutionProvider",
      ]

Urgency

just a small tip for others.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.0

Model File

No response

Is this a quantized model?

No

@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Mar 9, 2024
@hmaarrfk
Copy link
Author

hmaarrfk commented Mar 9, 2024

To make this easy, a sentence could be added:

Even in the case that Onnxruntime is pre-built with the binary code for your GPU architecture, by default, the CUDA Onnxruntime will perform an an exhaustive search for the best performant cuDNN convolution algorithm. This is controlled by the parameter cudnn_conv_algo_search and can be specified at the session creation time. See LINK TO YOUR CHOSEN DOCUMENTATION for more information.

@hariharans29 hariharans29 added documentation improvements or additions to documentation; typically submitted using template and removed ep:CUDA issues related to the CUDA execution provider labels Mar 11, 2024
@hariharans29
Copy link
Member

hariharans29 commented Mar 12, 2024

Thanks for the feedback - @hmaarrfk. It is good to document this. Please keep in mind though- that the slowness of the first Run() may not be limited to just this. The allocations to grow the underlying memory pool could also cause the first Run() to be slow(er) than subsequent runs. Usually, a good practise is to do a few warm-up Runs using the session instance using representative inputs before the "real" Runs.

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 12, 2024
@hmaarrfk
Copy link
Author

no stale. doc was not updated.

@github-actions github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Apr 13, 2024
@spoorgholi74
Copy link

@hmaarrfk I faced a similar issue.

What does the "cudnn_conv_algo_search" flag actually do. Cause I can also notice it is slow for the first few (up to 100 runs) and then the "EXHAUSTIVE" is just a tiny bit faster.

@jefromson
Copy link

@spoorgholi74 it is trying various convolution algorithms to choose the fastest and it needs to run them all once to time them before choosing which to use overall.

We found that of the three options (EXHAUSTIVE, DEFAULT, and HEURISTIC), HEURISTIC is the fastest and yields great results.

@slashedstar
Copy link

slashedstar commented Aug 31, 2024

This solved a problem I was having but still leaves me wondering shouldn't it be caching the results from the exhaustive search instead of performing it on every run? After reading #10746 I've tried setting os.environ["CUDA_CACHE_MAXSIZE"] = "4294967296" and CUDA_CACHE_PATH to a known path but nothing worked (the default CUDA_CACHE_PATH was probably fine too since it had some files from other stuff)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation improvements or additions to documentation; typically submitted using template
Projects
None yet
Development

No branches or pull requests

5 participants