-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strategy on non-SSE intrinsics #82
Comments
Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their |
The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are prefixed with |
|
Doesn't that completely defeat the purpose of a translator/mapping? |
Danila Kutenin pointed out: > Technically speaking, _mm_fmadd_ps is not an SSE extension, this was > introduced with fma extension which took place even after AVX. To clarify the purpose of SSE2NEON, this pach would drop the existing FMA implementation. Related: #82
Danila Kutenin pointed out: > Technically speaking, _mm_fmadd_ps is not an SSE extension, this was > introduced with fma extension which took place even after AVX. To clarify the purpose of SSE2NEON, this pach would drop the existing FMA implementation. The instruction vfmaq_f32, standing for "fused floating-point multiply-accumulate", is only available for VFPv4+. Thus, for Armv7-A targets, we have to take the following cases into consideration: * VFPv3, which is implemented on Cortex-R4, R5, Cortex-A9 * VFPv4, which is implemented on the A15 and Cortex-A7, or later According to the ACLE spec[1], "__ARM_FEATURE_FMA" is defined to 1 if the hardware floating-point architecture supports fused floating-point multiply-accumulate. Related: #82 [1] https://developer.arm.com/architectures/system-architectures/software-standards/acle
from the readme:
|
Finally, regarding SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AES extension, we have achieved 100% coverage, meaning that SSE2NEON somehow outperforms SIMDe in respect of SSE intrinsics. It is time to pursue translating AVX. Header avx2neon.h of Intel Embree is an excellent starting point when translating more AVX intrinsics. avx2intrin-emu.h, avxintrin-emu.h, and avxintrin-neon.h from jsource can be checked as well. |
Any progress on avx2? @jserv |
sse2neon
aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension, and AVX intrinsics would be excluded.@danlark1 pointed out:
We do need to think of the strategy on non-SSE intrinsics to ease the platform transition efforts.
The text was updated successfully, but these errors were encountered: