Other SIMD types #39

peastman · 2023-11-10T00:03:24Z

Do you have interest in supporting other SIMD types? I'm particularly interested in AVX, for modern x86 processors, and NEON, for ARM processors.

I would be willing to implement this, if it's something you want. It would be useful to me, and the amount of code for each one looks small.

Another possibility is to create an implementation using the portable vector extension in clang and gcc (not VisualStudio though). You can write a single implementation that automatically works on all architectures and uses whatever vector instructions are available.

peastman · 2023-11-13T21:47:16Z

I wrote an implementation with clang/gcc portable vectors. It gives a nice speedup on my ARM Mac.

After benchmarking on a couple of computers, I concluded that the only routine that should be explicitly vectorized is vecdot(). All the others are simple enough that the compiler can vectorize them automatically. The handwritten routines never help performance, and sometimes hurt. On modern x86 processors, the compiler can generate AVX code that is faster than the handwritten SSE code.

Try compiling with -DUSE_SSE -mavx and measure the speed. Now replace all the SSE routines except the dot product ones with the ANSI versions. It gets slightly faster.

I can create a PR with these changes, if you want.

peastman mentioned this issue Nov 28, 2023

Enable SSE code for LBFGS openmm/openmm#4327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Other SIMD types #39

Other SIMD types #39

peastman commented Nov 10, 2023

peastman commented Nov 13, 2023

Other SIMD types #39

Other SIMD types #39

Comments

peastman commented Nov 10, 2023

peastman commented Nov 13, 2023