-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] fp16 support and performance #22242
Comments
Please share more information of your question. |
I've encountered the same issue: when running inference on Moreover, if you convert the model to the Given that CPUs supporting Arm64-v8.2 and later versions do indeed support FP16 computations, I would greatly appreciate it if ONNX Runtime could prioritize the implementation of ARM-CPU-FP16 support. This feature would be highly beneficial for many users and would significantly improve the efficiency of mobile models. |
We're working on adding more fp16 support on arm64 as well as gpu support (which would handle fp16 models as well). |
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
+1 to this |
+1 |
Describe the issue
FP16 model inference is slower compared to FP32. Does FP16 inference require additional configuration or just need to convert the model to FP16
To reproduce
convert onnx model from fp32 to fp16 using onnxmltools
onnxruntime c++ liblary inference(convert inputs and outputs data format from fp32 to fp16)
Urgency
No response
Platform
Android
OS Version
34
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
C++
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered: