Strategy on non-SSE intrinsics #82

jserv · 2020-07-21T09:34:44Z

sse2neon aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension, and AVX intrinsics would be excluded.

@danlark1 pointed out:

Technically speaking, _mm_fmadd_ps is not an SSE extension, this was introduced with fma extension which took place even after AVX.

We do need to think of the strategy on non-SSE intrinsics to ease the platform transition efforts.

The text was updated successfully, but these errors were encountered:

jasonliu-- · 2021-05-24T00:25:33Z

Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their README claims that they are already incorporating your sse2neon.h header directly into their code base, and they already also have header files that provide support for AVX, FMA, and lots of other intrinsics.

danlark1 · 2021-05-24T00:53:35Z

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are prefixed with simde_, I know the users which are much more willing to update headers for all their code and dependencies rather than updating all call sites. If we can collaborate with simde on that, that would be great

jserv · 2021-05-24T01:08:03Z

Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their README claims that they are already incorporating your sse2neon.h header directly into their code base, and they already also have header files that provide support for AVX, FMA, and lots of other intrinsics.

simde already merged most parts of SSE2NEON efforts. See simde #499 for details. Here, we focus on Arm/Aarch64 specific tweaks, which can eventually get merged into simde.

jasonliu-- · 2021-05-24T03:19:57Z

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with simde_

Doesn't that completely defeat the purpose of a translator/mapping?

Danila Kutenin pointed out: > Technically speaking, _mm_fmadd_ps is not an SSE extension, this was > introduced with fma extension which took place even after AVX. To clarify the purpose of SSE2NEON, this pach would drop the existing FMA implementation. Related: #82

Danila Kutenin pointed out: > Technically speaking, _mm_fmadd_ps is not an SSE extension, this was > introduced with fma extension which took place even after AVX. To clarify the purpose of SSE2NEON, this pach would drop the existing FMA implementation. The instruction vfmaq_f32, standing for "fused floating-point multiply-accumulate", is only available for VFPv4+. Thus, for Armv7-A targets, we have to take the following cases into consideration: * VFPv3, which is implemented on Cortex-R4, R5, Cortex-A9 * VFPv4, which is implemented on the A15 and Cortex-A7, or later According to the ACLE spec[1], "__ARM_FEATURE_FMA" is defined to 1 if the hardware floating-point architecture supports fused floating-point multiply-accumulate. Related: #82 [1] https://developer.arm.com/architectures/system-architectures/software-standards/acle

aqrit · 2021-08-17T20:08:14Z

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with simde_

Doesn't that completely defeat the purpose of a translator/mapping?

from the readme:

If you define SIMDE_ENABLE_NATIVE_ALIASES before including SIMDe you can use the same names as the native functions.

jserv · 2022-12-26T05:47:23Z

Finally, regarding SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AES extension, we have achieved 100% coverage, meaning that SSE2NEON somehow outperforms SIMDe in respect of SSE intrinsics. It is time to pursue translating AVX. Header avx2neon.h of Intel Embree is an excellent starting point when translating more AVX intrinsics.

avx2intrin-emu.h, avxintrin-emu.h, and avxintrin-neon.h from jsource can be checked as well.

yiguolei · 2024-10-24T10:27:51Z

Any progress on avx2? @jserv

jserv mentioned this issue Jul 26, 2020

Added more FMA functions with tests #89

Closed

marktwtn self-assigned this Sep 14, 2020

jserv mentioned this issue Dec 28, 2020

feat: Implement FMA function _mm_fmadd_pd #245

Closed

jserv mentioned this issue Jun 4, 2021

Drop FMA intrinsic #445

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategy on non-SSE intrinsics #82

Strategy on non-SSE intrinsics #82

jserv commented Jul 21, 2020 •

edited

Loading

jasonliu-- commented May 24, 2021

danlark1 commented May 24, 2021 •

edited

Loading

jserv commented May 24, 2021

jasonliu-- commented May 24, 2021

aqrit commented Aug 17, 2021

jserv commented Dec 26, 2022 •

edited

Loading

yiguolei commented Oct 24, 2024

Strategy on non-SSE intrinsics #82

Strategy on non-SSE intrinsics #82

Comments

jserv commented Jul 21, 2020 • edited Loading

jasonliu-- commented May 24, 2021

danlark1 commented May 24, 2021 • edited Loading

jserv commented May 24, 2021

jasonliu-- commented May 24, 2021

aqrit commented Aug 17, 2021

jserv commented Dec 26, 2022 • edited Loading

yiguolei commented Oct 24, 2024

jserv commented Jul 21, 2020 •

edited

Loading

danlark1 commented May 24, 2021 •

edited

Loading

jserv commented Dec 26, 2022 •

edited

Loading