Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategy on non-SSE intrinsics #82

Open
jserv opened this issue Jul 21, 2020 · 7 comments
Open

Strategy on non-SSE intrinsics #82

jserv opened this issue Jul 21, 2020 · 7 comments
Assignees

Comments

@jserv
Copy link
Member

jserv commented Jul 21, 2020

sse2neon aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension, and AVX intrinsics would be excluded.

@danlark1 pointed out:

Technically speaking, _mm_fmadd_ps is not an SSE extension, this was introduced with fma extension which took place even after AVX.

We do need to think of the strategy on non-SSE intrinsics to ease the platform transition efforts.

@jasonliu--
Copy link

Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their README claims that they are already incorporating your sse2neon.h header directly into their code base, and they already also have header files that provide support for AVX, FMA, and lots of other intrinsics.

@danlark1
Copy link
Collaborator

danlark1 commented May 24, 2021

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are prefixed with simde_, I know the users which are much more willing to update headers for all their code and dependencies rather than updating all call sites. If we can collaborate with simde on that, that would be great

@jserv
Copy link
Member Author

jserv commented May 24, 2021

Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their README claims that they are already incorporating your sse2neon.h header directly into their code base, and they already also have header files that provide support for AVX, FMA, and lots of other intrinsics.

simde already merged most parts of SSE2NEON efforts. See simde #499 for details. Here, we focus on Arm/Aarch64 specific tweaks, which can eventually get merged into simde.

@jasonliu--
Copy link

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with simde_

Doesn't that completely defeat the purpose of a translator/mapping?

jserv added a commit that referenced this issue Jun 4, 2021
Danila Kutenin pointed out:
> Technically speaking, _mm_fmadd_ps is not an SSE extension, this was
> introduced with fma extension which took place even after AVX.

To clarify the purpose of SSE2NEON, this pach would drop the existing
FMA implementation.

Related: #82
jserv added a commit that referenced this issue Jun 5, 2021
Danila Kutenin pointed out:
> Technically speaking, _mm_fmadd_ps is not an SSE extension, this was
> introduced with fma extension which took place even after AVX.

To clarify the purpose of SSE2NEON, this pach would drop the existing
FMA implementation.

The instruction vfmaq_f32, standing for "fused floating-point
multiply-accumulate", is only available for VFPv4+. Thus, for Armv7-A
targets, we have to take the following cases into consideration:
* VFPv3, which is implemented on Cortex-R4, R5, Cortex-A9
* VFPv4, which is implemented on the A15 and Cortex-A7, or later

According to the ACLE spec[1], "__ARM_FEATURE_FMA" is defined to 1 if
the hardware floating-point architecture supports fused floating-point
multiply-accumulate.

Related: #82

[1] https://developer.arm.com/architectures/system-architectures/software-standards/acle
@aqrit
Copy link
Contributor

aqrit commented Aug 17, 2021

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with simde_

Doesn't that completely defeat the purpose of a translator/mapping?

from the readme:

If you define SIMDE_ENABLE_NATIVE_ALIASES before including SIMDe you can use the same names as the native functions.

@jserv
Copy link
Member Author

jserv commented Dec 26, 2022

Finally, regarding SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AES extension, we have achieved 100% coverage, meaning that SSE2NEON somehow outperforms SIMDe in respect of SSE intrinsics. It is time to pursue translating AVX. Header avx2neon.h of Intel Embree is an excellent starting point when translating more AVX intrinsics.

avx2intrin-emu.h, avxintrin-emu.h, and avxintrin-neon.h from jsource can be checked as well.

@yiguolei
Copy link

Any progress on avx2? @jserv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants