[Performance] 2x Regression in 1st Inference time cost #18957
Labels
ep:DML
issues related to the DirectML execution provider
platform:windows
issues related to the Windows platform
quantization
issues related to quantization
Describe the issue
Comparing 1st inference time costs between ORT 1.16.3 and 1.14, a number of public models show significant regression in "Session Creation" and "Evaluate" time costs. This issue is reproducible on both MLAS and DirectML EPs, with both WinML and ORT APIs.
Average Evaluate times are not affected given a number of iterations.
The regression is observed also with 1.15 binaries revealing that the regression was introduced between 1.14.0 and 1.15.0
To reproduce
Using onnxruntime_perf_test or MicrosoftMLRunner with binaries from 1.16.3 and 1.14.0 note the "Session Creation" and "Evaluate" times (example - deeplabv3 shows a 2x regression).
cmd: MicrosoftMLRunner.exe -CPU -Perf -model deeplabv3_u8s8.onnx
Urgency
This issue affects scenarios where the 1st inference time is important and discourages software vendors from upgrading to ORT 1.16.3
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.16.3
ONNX Runtime API
WinML
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered: