Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] First inference with CUDAExecutionProvider is slow #21541

Closed
RomRoc opened this issue Jul 29, 2024 · 2 comments
Closed

[Performance] First inference with CUDAExecutionProvider is slow #21541

RomRoc opened this issue Jul 29, 2024 · 2 comments
Labels
ep:CUDA issues related to the CUDA execution provider performance issues related to performance regressions

Comments

@RomRoc
Copy link

RomRoc commented Jul 29, 2024

Describe the issue

I run inference of an image using onnxruntime in Colab T4.
First inference is much slower than following inferences, probably because of loading model (12 sec).
If I run inference more times, inference is very fast (0 sec).

If I use CPUExecutionProvider, it is not so slow as first time Cuda (4 sec).

Can I optimize first inference?
Thanks

To reproduce

Here is my colab code:

!pip install -U torch torchvision torchaudio
!pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/

import onnxruntime
import cv2
import numpy as np
from google.colab.patches import cv2_imshow

model_path = 'output/model.onnx'
image_path = 'img.jpg'

session = onnxruntime.InferenceSession(model_path, providers=['CUDAExecutionProvider'])
#session = onnxruntime.InferenceSession(model_path, providers=['CPUExecutionProvider'])

image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = np.transpose(image, (2, 0, 1))
image = image.astype(np.float32) / 255.0
image -= 0.5
image /= 0.5
image = image[np.newaxis, ...]

result = session.run(None, {'x': image})[0][0]

cv2_imshow(result*255)

Urgency

No response

Platform

Linux

OS Version

ubuntu

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.2

Model File

No response

Is this a quantized model?

Unknown

@RomRoc RomRoc added the performance issues related to performance regressions label Jul 29, 2024
@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Jul 29, 2024
@tianleiwu
Copy link
Contributor

tianleiwu commented Jul 29, 2024

Image model usually have convolution, and cuDNN will need time to find the best algo. You can try set cudnn_conv_algo_search to HEURISTIC (see examples) in cuda provider option to see whether it could reduce the latency.

To improve further, it need serialize the algo search result to file so that it can reuse in next session. Right now, it is not supported yet.

@RomRoc
Copy link
Author

RomRoc commented Jul 31, 2024

Thanks!
Setting HEURISTIC or DEFAULT definitively solve the issue, inference is < 1 sec.
Here is the code:
session = onnxruntime.InferenceSession(model_path, providers=[('CUDAExecutionProvider', {'cudnn_conv_algo_search': 'DEFAULT'})])

As proposed in other issue #19839 it could be useful to make it clear in documentation, or make it the default option.

Thanks for quick and reliable answer.

@RomRoc RomRoc closed this as completed Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

2 participants