You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do you have interest in supporting other SIMD types? I'm particularly interested in AVX, for modern x86 processors, and NEON, for ARM processors.
I would be willing to implement this, if it's something you want. It would be useful to me, and the amount of code for each one looks small.
Another possibility is to create an implementation using the portable vector extension in clang and gcc (not VisualStudio though). You can write a single implementation that automatically works on all architectures and uses whatever vector instructions are available.
The text was updated successfully, but these errors were encountered:
I wrote an implementation with clang/gcc portable vectors. It gives a nice speedup on my ARM Mac.
After benchmarking on a couple of computers, I concluded that the only routine that should be explicitly vectorized is vecdot(). All the others are simple enough that the compiler can vectorize them automatically. The handwritten routines never help performance, and sometimes hurt. On modern x86 processors, the compiler can generate AVX code that is faster than the handwritten SSE code.
Try compiling with -DUSE_SSE -mavx and measure the speed. Now replace all the SSE routines except the dot product ones with the ANSI versions. It gets slightly faster.
I can create a PR with these changes, if you want.
Do you have interest in supporting other SIMD types? I'm particularly interested in AVX, for modern x86 processors, and NEON, for ARM processors.
I would be willing to implement this, if it's something you want. It would be useful to me, and the amount of code for each one looks small.
Another possibility is to create an implementation using the portable vector extension in clang and gcc (not VisualStudio though). You can write a single implementation that automatically works on all architectures and uses whatever vector instructions are available.
The text was updated successfully, but these errors were encountered: