Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] slower inference after upgrade from 1.19.2 to 1.20.1 #23006

Open
iperov opened this issue Dec 4, 2024 · 8 comments
Open

[Performance] slower inference after upgrade from 1.19.2 to 1.20.1 #23006

iperov opened this issue Dec 4, 2024 · 8 comments
Assignees
Labels
ep:CUDA issues related to the CUDA execution provider performance issues related to performance regressions

Comments

@iperov
Copy link
Contributor

iperov commented Dec 4, 2024

Describe the issue

x15 slower inference after upgrade from 1.19.2 to 1.20.1

The problem arises in the interleaving of input tensor resolutions, for example in pyramid images for object detection.

CUDA : x15 slower
CPU : x2-3 slower

To reproduce

import numpy as np
import onnxruntime as rt
import time

class timeit:
    def __init__(self, msg : str = None):
        self._msg = msg if msg is not None else ''
    def __enter__(self):
        self.t = time.perf_counter()
    def __exit__(self, a,b,c):
        print(f'Time of {self._msg}: {time.perf_counter()-self.t}')

sess = rt.InferenceSession('YoloV7Face.onnx',
                           providers=[ ('CUDAExecutionProvider', {'device_id':0}) ])

input_name = sess.get_inputs()[0].name

# Various resolution image (for example pyramid images)
imgs = [np.zeros((1,3,256+i*32,256+i*32), np.uint8) for i in range(5)]

while True:
    with timeit(): 
        for img in imgs:
            sess.run(None, {input_name: img})

# 1.19.2 ~30ms 
# 1.20.1 ~500ms

Urgency

Not urgent. Will stay on 1.19.2

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA, CPU

Execution Provider Library Version

CUDA 12.4

Model File

YoloV7Face.zip

Is this a quantized model?

No

@iperov iperov added the performance issues related to performance regressions label Dec 4, 2024
@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Dec 4, 2024
@tianleiwu tianleiwu self-assigned this Dec 5, 2024
@tianleiwu
Copy link
Contributor

Initial finding is that there is long CPU time before launching Conv kernel (the kernel time is very close) for 1.20.1. It is likely caused by cudnn frontend introduced in #19470.
Another issue is cudnn algo search seems only run once for different input shape for 1.20.1. Need further investigations.

@iperov
Copy link
Contributor Author

iperov commented Dec 6, 2024

@tianleiwu but CPUep also slower. Perhaps problem not in cuda?

@henryruhs
Copy link

@iperov #22705 (comment)

@iperov
Copy link
Contributor Author

iperov commented Dec 13, 2024

@henryruhs what is this? As I said CPU ep is also slower. So the problem not in cuda.

@henryruhs
Copy link

then change the overall report, most of it is CUDA based. also kinda rude this reaction.

@iperov
Copy link
Contributor Author

iperov commented Dec 13, 2024

@henryruhs
"CPU ep : also x2-3 slower" are in two places: topic and in the comment above.

Flooding useless messages is rude.

@henryruhs
Copy link

Probably useless for those who are not smart enough, update the optset version of the models.

@iperov
Copy link
Contributor Author

iperov commented Dec 13, 2024

@henryruhs Updating the opset version is not required by the specification. Otherwise, for example onnxruntime would require it explicitly and would not run the model. This is a silly thought that might come to mind. By the way you keep clogging up the thread, and insulting.

@iperov iperov changed the title [Performance] [CUDA] x15 slower inference after upgrade from 1.19.2 to 1.20.1 [Performance] slower inference after upgrade from 1.19.2 to 1.20.1 Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

3 participants