You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All SIMD types may now be used simultaneously, instead of selecting one at compile time. For example you may define all of LIBDIVIDE_SSE2, LIBDIVIDE_AVX2, and LIBDIVIDE_AVX512 and use them simultaneously.
ARM NEON types are now supported. New functions take uint32x4_t, int32x4_t, uint64x2_t, and int64x2_t. Note: while libdivide is tested on both ARM32 and AArch64, NEON intrinsics have only been tested on AArch64.
Breaking: To support multiple vector types, vector functions have been renamed according to their width (#52). Instead of libdivide_u32_do_vector, now use libdivide_u32_do_vec128 for SSE2 or NEON, libdivide_u32_do_vec256 for AVX2, and libdivide_u32_do_vec512 for AVX512.
On non-x86 CPUs, generating 64 bit dividers is now faster than before. Previously libdivide used __uint128_t when available; however libdivide's fallback code was shown to be several times faster so the __uint128_t path has been removed. x86 and x86-64 CPUs are unaffected.
Certain code sourced from StackOverflow has been reimplemented; this code had an ambiguous license. All code in libdivide is now covered under the zlib or boost license (at your option).
libdivide.h no longer requires C++11 or later. The minimum language standards are C99 or C++98.