[Performance] fp16 model performance decreases when the "inter op threads" setting is greater than 1. #18822
Labels
quantization
issues related to quantization
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
I converted the fp32 model to an fp16 model through the "convert_float_to_float16" and tested the inference time with the same data. When intra_op_threads is equal to 1, 4, and 8, the cost time of the fp32 model meets expectations. The cost time of the fp16 model meets expectations when intra_op_threads is equal to 1, but does not meet expectations when intra_op_threads is equal to 4 or 8.
To reproduce
Using both the fp32 model and the fp16 model, test different values of intra_op_threads.
Urgency
No response
Platform
Linux
OS Version
Ububtu 20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.3
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered: