Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other SIMD types #39

Open
peastman opened this issue Nov 10, 2023 · 1 comment
Open

Other SIMD types #39

peastman opened this issue Nov 10, 2023 · 1 comment

Comments

@peastman
Copy link

Do you have interest in supporting other SIMD types? I'm particularly interested in AVX, for modern x86 processors, and NEON, for ARM processors.

I would be willing to implement this, if it's something you want. It would be useful to me, and the amount of code for each one looks small.

Another possibility is to create an implementation using the portable vector extension in clang and gcc (not VisualStudio though). You can write a single implementation that automatically works on all architectures and uses whatever vector instructions are available.

@peastman
Copy link
Author

I wrote an implementation with clang/gcc portable vectors. It gives a nice speedup on my ARM Mac.

After benchmarking on a couple of computers, I concluded that the only routine that should be explicitly vectorized is vecdot(). All the others are simple enough that the compiler can vectorize them automatically. The handwritten routines never help performance, and sometimes hurt. On modern x86 processors, the compiler can generate AVX code that is faster than the handwritten SSE code.

Try compiling with -DUSE_SSE -mavx and measure the speed. Now replace all the SSE routines except the dot product ones with the ANSI versions. It gets slightly faster.

I can create a PR with these changes, if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant