Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config to disable FP16 optimizations on aarch64 UNIX [Build] #18744

Closed
oddfacade opened this issue Dec 7, 2023 · 10 comments
Closed

Add config to disable FP16 optimizations on aarch64 UNIX [Build] #18744

oddfacade opened this issue Dec 7, 2023 · 10 comments
Labels
build build issues; typically submitted using template stale issues that have not been addressed in a while; categorized by a bot

Comments

@oddfacade
Copy link

Describe the issue

I'm unable to build for aarch64 targets that don't support FP16 instructions, due to the configuration here, which includes sources compiled with -march=armv8.2-a+fp16 in all aarch64 builds. (The issue is specifically that the optimized methods fail to inline when combined with the rest of the source, which is compiled without +fp16.)

I have been able to work around this by patching the source to prevent MLAS_F16VEC_INTRINSICS_SUPPORTED from being set here. I would like to make this option directly configurable from CMake.

I made a discussion post a few days ago with some context.

If there's support for this feature, I have an implementation ready to contribute.

Urgency

No response

Target platform

aarch64

Build script

CMake build targeting Linux with -march=armv8.2-a.

Error / output

Misc. methods in fp16_common.h fail to inline.

Visual Studio Version

No response

GCC / Compiler Version

11.2, 13.2

@oddfacade oddfacade added the build build issues; typically submitted using template label Dec 7, 2023
@YUNQIUGUO YUNQIUGUO removed the build build issues; typically submitted using template label Dec 7, 2023
@YUNQIUGUO
Copy link
Contributor

@chenfucn

@YUNQIUGUO YUNQIUGUO added the build build issues; typically submitted using template label Dec 7, 2023
@snnn
Copy link
Member

snnn commented Dec 8, 2023

They are compile time flags which only impose requirements on compilers, not hardware. For example, in the same file you will see a lot of references to AVX512, but today most PC hardware do not have AVX512. It doesn't cause any problem.

@oddfacade
Copy link
Author

This is the kind of build error I get when I build using arm-gnu 11.2 with -march=armv8.2-a.

In file included from /var/local/workspace/onnxruntime/onnxruntime/core/mlas/lib/mlasi.h:50,                                                                                                                                                  
                 from /var/local/workspace/onnxruntime/onnxruntime/core/mlas/lib/fp16_common.h:20,                                                                                                                                            
                 from /var/local/workspace/onnxruntime/onnxruntime/core/mlas/lib/activate_fp16.cpp:17:                                                                                                                                        
/opt/toolchains/arm-gnu-toolchain-11.2-2022.02-x86_64-aarch64-none-linux-gnu/lib/gcc/aarch64-none-linux-gnu/11.2.1/include/arm_neon.h: In function ‘MLAS_FLOAT16X8 MlasAddFloat16x8(MLAS_FLOAT16X8, MLAS_FLOAT16X8)’:                         
/opt/toolchains/arm-gnu-toolchain-11.2-2022.02-x86_64-aarch64-none-linux-gnu/lib/gcc/aarch64-none-linux-gnu/11.2.1/include/arm_neon.h:31367:1: error: inlining failed in call to ‘always_inline’ ‘float16x8_t vaddq_f16(float16x8_t, float16x8
_t)’: target specific option mismatch                                                                                                                                                                                                         
31367 | vaddq_f16 (float16x8_t __a, float16x8_t __b)                                                                                                                                                                                          
      | ^~~~~~~~~                                                                                                                                                                                                                             
In file included from /var/local/workspace/onnxruntime/onnxruntime/core/mlas/lib/activate_fp16.cpp:17:                                                                                                                                        
/var/local/workspace/onnxruntime/onnxruntime/core/mlas/lib/fp16_common.h:113:21: note: called from here
  113 |     return vaddq_f16(Vector1, Vector2);
      |            ~~~~~~~~~^~~~~~~~~~~~~~~~~~
gmake[2]: *** [_deps/onnxruntime-build/CMakeFiles/onnxruntime_mlas.dir/build.make:716: _deps/onnxruntime-build/CMakeFiles/onnxruntime_mlas.dir/var/local/workspace/onnxruntime/onnxruntime/core/mlas/lib/activate_fp16.cpp.o] Error 1

When I build with add_compile_options(-march=armv8.2-a+fp16) set globally, the build succeeds, but the resulting binary fails to run on my Cortex A57 VM (I get an illegal instruction error).

@chenfucn
Copy link
Contributor

These are two different issues. It's great to see that with add_compile_options(-march=armv8.2-a+fp16) set globally, the build succeeds. The problem is the illegal instruction error instead of building problem then. We do have code to detect fp16 hardware feature in ARM. We need to fix the bug causing the illegal instruction, instead of using build options to get around it. Can you build in Debug and locate where the illegal instruction error was thrown? Thanks!

@snnn
Copy link
Member

snnn commented Dec 11, 2023

Have you tried GCC 13? I believe our code can pass the build with GCC 13 without problems.

@chenfucn
Copy link
Contributor

https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/mlas/lib/halfgemm.cpp#L29
The hardware feature detection code is here. Unfortunately I don't have hardware to test this at this moment. Would you see whether this function is called before the illegal instruction error is thrown?

@oddfacade
Copy link
Author

@chenfucn Thank you for the clarification and guidance. I should have some time this week to look into this. I will update here with whatever I find.

@snnn I definitely tried building with GCC 13.2, but I don't think I ran that code on the target to see if the hardware feature detection works. I'll check that as well.

@snnn
Copy link
Member

snnn commented Dec 12, 2023

When I build with add_compile_options(-march=armv8.2-a+fp16) set globally, the build succeeds, but the resulting binary fails to run on my Cortex A57 VM

It is expected.

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 13, 2024
Copy link
Contributor

This issue has been automatically closed due to inactivity. Please reactivate if further support is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

4 participants