[Performance] RE: When using CUDA the first run is very slow -- cudnn_conv_algo_search #19838

hmaarrfk · 2024-03-09T15:41:39Z

Describe the issue

I didn't want to reply to #10746 since it was mentionned that the issue is a placeholder.

I wanted to say that in our work, we've found that issue to have omitted a critical piece of information regarding the effect of cudnn_conv_algo_search to the performance of the first run.

The default value, EXAUSTIVE as mentioned in the C API and the Python documentation

Seems to be a significant contributor to this effect.
It would be good if a small note were added in that placeholder issue to mention that users would have a choice in the session optimization strategy.

Thank you @davidmezzetti for bringing thing to my attention in your blog post
https://medium.com/neuml/debug-onnx-gpu-performance-c9290fe07459

cc: @jefromson

To reproduce

Start your onnx session with the following options:

and change between the different options for cudnn_conv_algo_search

    providers=[
        ("CUDAExecutionProvider", {
            # "cudnn_conv_algo_search": "DEFAULT",
            # "cudnn_conv_algo_search": "HEURISTIC",
            "cudnn_conv_algo_search": "EXHAUSTIVE",
        }),
        # "CPUExecutionProvider",
      ]

Urgency

just a small tip for others.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.0

Model File

No response

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

hmaarrfk · 2024-03-09T15:46:03Z

To make this easy, a sentence could be added:

Even in the case that Onnxruntime is pre-built with the binary code for your GPU architecture, by default, the CUDA Onnxruntime will perform an an exhaustive search for the best performant cuDNN convolution algorithm. This is controlled by the parameter cudnn_conv_algo_search and can be specified at the session creation time. See LINK TO YOUR CHOSEN DOCUMENTATION for more information.

hariharans29 · 2024-03-12T02:16:50Z

Thanks for the feedback - @hmaarrfk. It is good to document this. Please keep in mind though- that the slowness of the first Run() may not be limited to just this. The allocations to grow the underlying memory pool could also cause the first Run() to be slow(er) than subsequent runs. Usually, a good practise is to do a few warm-up Runs using the session instance using representative inputs before the "real" Runs.

github-actions · 2024-04-12T15:01:12Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

hmaarrfk · 2024-04-12T23:38:43Z

no stale. doc was not updated.

spoorgholi74 · 2024-06-19T08:11:57Z

@hmaarrfk I faced a similar issue.

What does the "cudnn_conv_algo_search" flag actually do. Cause I can also notice it is slow for the first few (up to 100 runs) and then the "EXHAUSTIVE" is just a tiny bit faster.

jefromson · 2024-06-19T16:44:30Z

@spoorgholi74 it is trying various convolution algorithms to choose the fastest and it needs to run them all once to time them before choosing which to use overall.

We found that of the three options (EXHAUSTIVE, DEFAULT, and HEURISTIC), HEURISTIC is the fastest and yields great results.

slashedstar · 2024-08-31T05:21:57Z

This solved a problem I was having but still leaves me wondering shouldn't it be caching the results from the exhaustive search instead of performing it on every run? After reading #10746 I've tried setting os.environ["CUDA_CACHE_MAXSIZE"] = "4294967296" and CUDA_CACHE_PATH to a known path but nothing worked (the default CUDA_CACHE_PATH was probably fine too since it had some files from other stuff)

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Mar 9, 2024

hariharans29 added documentation improvements or additions to documentation; typically submitted using template and removed ep:CUDA issues related to the CUDA execution provider labels Mar 11, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 12, 2024

github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] RE: When using CUDA the first run is very slow -- cudnn_conv_algo_search #19838

[Performance] RE: When using CUDA the first run is very slow -- cudnn_conv_algo_search #19838

hmaarrfk commented Mar 9, 2024

hmaarrfk commented Mar 9, 2024

hariharans29 commented Mar 12, 2024 •

edited

Loading

github-actions bot commented Apr 12, 2024

hmaarrfk commented Apr 12, 2024

spoorgholi74 commented Jun 19, 2024

jefromson commented Jun 19, 2024

slashedstar commented Aug 31, 2024 •

edited

Loading

[Performance] RE: When using CUDA the first run is very slow -- cudnn_conv_algo_search #19838

[Performance] RE: When using CUDA the first run is very slow -- cudnn_conv_algo_search #19838

Comments

hmaarrfk commented Mar 9, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

hmaarrfk commented Mar 9, 2024

hariharans29 commented Mar 12, 2024 • edited Loading

github-actions bot commented Apr 12, 2024

hmaarrfk commented Apr 12, 2024

spoorgholi74 commented Jun 19, 2024

jefromson commented Jun 19, 2024

slashedstar commented Aug 31, 2024 • edited Loading

hariharans29 commented Mar 12, 2024 •

edited

Loading

slashedstar commented Aug 31, 2024 •

edited

Loading