[Performance] fp16 support and performance #22242

cbingdu · 2024-09-27T05:54:53Z

Describe the issue

FP16 model inference is slower compared to FP32. Does FP16 inference require additional configuration or just need to convert the model to FP16

To reproduce

convert onnx model from fp32 to fp16 using onnxmltools
onnxruntime c++ liblary inference（convert inputs and outputs data format from fp32 to fp16）

Urgency

No response

Platform

Android

OS Version

34

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C++

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

wejoncy · 2024-09-27T06:42:06Z

Please share more information of your question.
Such as which platform, what EP do you use?
what's the model looks like?

DakeQQ · 2024-09-30T02:11:50Z

I've encountered the same issue: when running inference on *.onnx Float16 models (such as: LLM, YOLO, VAE, Unet, Bert...) directly with the CPU, there is no noticeable speedup. This is a significant problem because one would expect a performance gain from using Float16.

Moreover, if you convert the model to the *.ort format, the conversion tool automatically inserts a Cast operator that converts FP16 back to FP32. This automatic conversion completely negates any potential acceleration benefits we might have gained by using the NNAPI runtime with FP16 after converting to *.ort.

Given that CPUs supporting Arm64-v8.2 and later versions do indeed support FP16 computations, I would greatly appreciate it if ONNX Runtime could prioritize the implementation of ARM-CPU-FP16 support. This feature would be highly beneficial for many users and would significantly improve the efficiency of mobile models.

skottmckay · 2024-09-30T06:51:54Z

We're working on adding more fp16 support on arm64 as well as gpu support (which would handle fp16 models as well).

github-actions · 2024-10-30T17:41:09Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

leigaol · 2024-10-30T18:09:02Z

+1 to this

devYonz · 2024-11-25T05:09:31Z

+1

cbingdu added the performance issues related to performance regressions label Sep 27, 2024

github-actions bot added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Sep 27, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 30, 2024

github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] fp16 support and performance #22242

[Performance] fp16 support and performance #22242

cbingdu commented Sep 27, 2024

wejoncy commented Sep 27, 2024

DakeQQ commented Sep 30, 2024 •

edited

Loading

skottmckay commented Sep 30, 2024

github-actions bot commented Oct 30, 2024

leigaol commented Oct 30, 2024

devYonz commented Nov 25, 2024

[Performance] fp16 support and performance #22242

[Performance] fp16 support and performance #22242

Comments

cbingdu commented Sep 27, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

wejoncy commented Sep 27, 2024

DakeQQ commented Sep 30, 2024 • edited Loading

skottmckay commented Sep 30, 2024

github-actions bot commented Oct 30, 2024

leigaol commented Oct 30, 2024

devYonz commented Nov 25, 2024

DakeQQ commented Sep 30, 2024 •

edited

Loading