Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] 2x Regression in 1st Inference time cost #18957

Open
A-Satti opened this issue Dec 29, 2023 · 2 comments
Open

[Performance] 2x Regression in 1st Inference time cost #18957

A-Satti opened this issue Dec 29, 2023 · 2 comments
Labels
ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform quantization issues related to quantization

Comments

@A-Satti
Copy link
Contributor

A-Satti commented Dec 29, 2023

Describe the issue

Comparing 1st inference time costs between ORT 1.16.3 and 1.14, a number of public models show significant regression in "Session Creation" and "Evaluate" time costs. This issue is reproducible on both MLAS and DirectML EPs, with both WinML and ORT APIs.

Average Evaluate times are not affected given a number of iterations.

The regression is observed also with 1.15 binaries revealing that the regression was introduced between 1.14.0 and 1.15.0

To reproduce

Using onnxruntime_perf_test or MicrosoftMLRunner with binaries from 1.16.3 and 1.14.0 note the "Session Creation" and "Evaluate" times (example - deeplabv3 shows a 2x regression).

cmd: MicrosoftMLRunner.exe -CPU -Perf -model deeplabv3_u8s8.onnx

Urgency

This issue affects scenarios where the 1st inference time is important and discourages software vendors from upgrading to ORT 1.16.3

Platform

Windows

OS Version

Windows 11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.3

ONNX Runtime API

WinML

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

@github-actions github-actions bot added ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform quantization issues related to quantization labels Dec 29, 2023
@cbourjau
Copy link
Contributor

Could you provide a flame graph or a profile that may shed some light on where the regression is located?

@A-Satti
Copy link
Contributor Author

A-Satti commented Jan 5, 2024

ort14_deeplabv3_-u8s8.json.json
ort16_deeplabv3-u8s8.json.json

Was not able to note sizeable regression in the operators.

We did find that disabling the optimization_level recovers Session Creation Time cost performance and with it enabled by default this value regresses for a number of models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform quantization issues related to quantization
Projects
None yet
Development

No branches or pull requests

2 participants