[Performance] slower inference after upgrade from 1.19.2 to 1.20.1 #23006

iperov · 2024-12-04T15:42:36Z

Describe the issue

x15 slower inference after upgrade from 1.19.2 to 1.20.1

The problem arises in the interleaving of input tensor resolutions, for example in pyramid images for object detection.

CUDA : x15 slower
CPU : x2-3 slower

To reproduce

import numpy as np
import onnxruntime as rt
import time

class timeit:
    def __init__(self, msg : str = None):
        self._msg = msg if msg is not None else ''
    def __enter__(self):
        self.t = time.perf_counter()
    def __exit__(self, a,b,c):
        print(f'Time of {self._msg}: {time.perf_counter()-self.t}')

sess = rt.InferenceSession('YoloV7Face.onnx',
                           providers=[ ('CUDAExecutionProvider', {'device_id':0}) ])

input_name = sess.get_inputs()[0].name

# Various resolution image (for example pyramid images)
imgs = [np.zeros((1,3,256+i*32,256+i*32), np.uint8) for i in range(5)]

while True:
    with timeit(): 
        for img in imgs:
            sess.run(None, {input_name: img})

# 1.19.2 ~30ms 
# 1.20.1 ~500ms

Urgency

Not urgent. Will stay on 1.19.2

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA, CPU

Execution Provider Library Version

CUDA 12.4

Model File

YoloV7Face.zip

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

tianleiwu · 2024-12-05T18:36:29Z

Initial finding is that there is long CPU time before launching Conv kernel (the kernel time is very close) for 1.20.1. It is likely caused by cudnn frontend introduced in #19470.
Another issue is cudnn algo search seems only run once for different input shape for 1.20.1. Need further investigations.

iperov · 2024-12-06T04:45:57Z

@tianleiwu but CPUep also slower. Perhaps problem not in cuda?

henryruhs · 2024-12-13T12:11:28Z

@iperov #22705 (comment)

iperov · 2024-12-13T12:18:31Z

@henryruhs what is this? As I said CPU ep is also slower. So the problem not in cuda.

henryruhs · 2024-12-13T12:21:26Z

then change the overall report, most of it is CUDA based. also kinda rude this reaction.

iperov · 2024-12-13T12:23:28Z

@henryruhs
"CPU ep : also x2-3 slower" are in two places: topic and in the comment above.

Flooding useless messages is rude.

henryruhs · 2024-12-13T12:26:11Z

Probably useless for those who are not smart enough, update the optset version of the models.

iperov · 2024-12-13T12:28:06Z

@henryruhs Updating the opset version is not required by the specification. Otherwise, for example onnxruntime would require it explicitly and would not run the model. This is a silly thought that might come to mind. By the way you keep clogging up the thread, and insulting.

iperov added the performance issues related to performance regressions label Dec 4, 2024

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Dec 4, 2024

tianleiwu self-assigned this Dec 5, 2024

iperov changed the title ~~[Performance] [CUDA] x15 slower inference after upgrade from 1.19.2 to 1.20.1~~ [Performance] slower inference after upgrade from 1.19.2 to 1.20.1 Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] slower inference after upgrade from 1.19.2 to 1.20.1 #23006

[Performance] slower inference after upgrade from 1.19.2 to 1.20.1 #23006

iperov commented Dec 4, 2024 •

edited

Loading

tianleiwu commented Dec 5, 2024

iperov commented Dec 6, 2024

henryruhs commented Dec 13, 2024

iperov commented Dec 13, 2024

henryruhs commented Dec 13, 2024

iperov commented Dec 13, 2024

henryruhs commented Dec 13, 2024

iperov commented Dec 13, 2024 •

edited

Loading

[Performance] slower inference after upgrade from 1.19.2 to 1.20.1 #23006

[Performance] slower inference after upgrade from 1.19.2 to 1.20.1 #23006

Comments

iperov commented Dec 4, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

tianleiwu commented Dec 5, 2024

iperov commented Dec 6, 2024

henryruhs commented Dec 13, 2024

iperov commented Dec 13, 2024

henryruhs commented Dec 13, 2024

iperov commented Dec 13, 2024

henryruhs commented Dec 13, 2024

iperov commented Dec 13, 2024 • edited Loading

iperov commented Dec 4, 2024 •

edited

Loading

iperov commented Dec 13, 2024 •

edited

Loading