You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having issues with clang's optimizer messing up this behaviour and nans still propagating.
The neon min/max instructions propagates NaNs and SSE2 ones don't (ish), so I've been defining SSE2NEON_PRECISE_MINMAX 1
the _mm_max_ps intrinsic becomes
vbslq_f32(vcgtq_f32(a, b), a, b);
This looks perfectly correct to me, but clang is optimizing this to the fmaxnm instruction. The fmaxnm instruction only deals with quiet NaNs, signalling NaNs still propagate. :(
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
For my personal point-of-view, I think this may be an issue of Clang.
For GCC with -O3 flag, it uses fcmgt, and, and bsl.
Here is a small program (modified by your example) for illustration: https://godbolt.org/z/sfrKbx1e8
One more, thing, kindly leave the link for the discussion on Clang forum if possible.
I am closing this issue since SSE2NEON recently added warning alerts for potential compiler misoptimizations. Unless we can find a better way to overcome these misoptimizations, no further action will be taken.
This might be a bug in clang but figure I'd report it here first.
I have a technique I use to clamp NaN values to zero.
It's pretty simple, you exploit the fact, nan > 0.0f == false
The
MAX
is done first on purpose.The SSE2 code is this
I'm having issues with clang's optimizer messing up this behaviour and nans still propagating.
The neon min/max instructions propagates NaNs and SSE2 ones don't (ish), so I've been defining
SSE2NEON_PRECISE_MINMAX 1
the
_mm_max_ps
intrinsic becomesThis looks perfectly correct to me, but clang is optimizing this to the
fmaxnm
instruction. Thefmaxnm
instruction only deals with quiet NaNs, signalling NaNs still propagate. :(https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/FMAXNM--vector---Floating-point-Maximum-Number--vector--
Here is a small program illustrating this happening
https://godbolt.org/z/eE1G3Gcov
I'm currently working around this by using inline assembly.
The text was updated successfully, but these errors were encountered: