Skip to content

v4.0.0

Compare
Choose a tag to compare
@ridiculousfish ridiculousfish released this 09 Mar 07:20
· 77 commits to master since this release
  • All SIMD types may now be used simultaneously, instead of selecting one at compile time. For example you may define all of LIBDIVIDE_SSE2, LIBDIVIDE_AVX2, and LIBDIVIDE_AVX512 and use them simultaneously.
  • ARM NEON types are now supported. New functions take uint32x4_t, int32x4_t, uint64x2_t, and int64x2_t. Note: while libdivide is tested on both ARM32 and AArch64, NEON intrinsics have only been tested on AArch64.
  • Breaking: To support multiple vector types, vector functions have been renamed according to their width (#52). Instead of libdivide_u32_do_vector, now use libdivide_u32_do_vec128 for SSE2 or NEON, libdivide_u32_do_vec256 for AVX2, and libdivide_u32_do_vec512 for AVX512.
  • On non-x86 CPUs, generating 64 bit dividers is now faster than before. Previously libdivide used __uint128_t when available; however libdivide's fallback code was shown to be several times faster so the __uint128_t path has been removed. x86 and x86-64 CPUs are unaffected.
  • Certain code sourced from StackOverflow has been reimplemented; this code had an ambiguous license. All code in libdivide is now covered under the zlib or boost license (at your option).
  • libdivide.h no longer requires C++11 or later. The minimum language standards are C99 or C++98.