Question about Thread pool and GEMV #221

chenhongyu2048 · 2024-04-16T17:27:17Z

Because I've been working on efficient GeMV multiplication on CPUs lately, I've found that I'm only going to be able to make a limited amount of improvement after adopting SIMD. Referring to your BesTLA library might inspire me, I'm really looking forward to BestLA's GeMV kernels.
Also, I have a question about BesTLA's thread pool, is it based on a custom thread pool, or is it based on OpenMP?

btw, I'm looking forward to seeing BesTLA become more widely used, or be called separately, just like OpenBLAS.

luoyu-intel · 2024-04-17T03:49:07Z

Referring to your BesTLA library might inspire me, I'm really looking forward to BestLA's GeMV kernels.

I've done some AVX2 GEMV kernels in this PR: #209. And it does show good performance on next-token inference.

Also, I have a question about BesTLA's thread pool, is it based on a custom thread pool, or is it based on OpenMP?

We've provided a public interface class of threading, which can be implemented by std::thread, OpenMP, or other thread pools like ONNXRuntime's. It's very recommended to use our thread pool on Intel client CPUs which is hybrid. It will be much faster than other thread pools.

btw, I'm looking forward to seeing BesTLA become more widely used, or be called separately, just like OpenBLAS.

We've provided all kinds of GEMMs: sgemm, igemm, hgemm and bf16gemm. Also their Weight-Only-Quantization versions: int3, int4, fp4, and other data types. You can use it as a BLAS library. You can refer ONNXRuntime's code for how to use BesTLA only in your project: https://github.com/microsoft/onnxruntime/blob/main/cmake/external/neural_speed.cmake

chenhongyu2048 · 2024-04-17T07:14:31Z

Thank you for your prompt reply!
In PR#209, I found that modification add avx2 sgemv and igemv kernels has been checked. But I can't find the exact location of the functions, I only find 3/4 bit version in bestla/bestla/kernel_avx2.h. Can you tell me where sgemv is implemented？

luoyu-intel · 2024-04-17T08:07:36Z

igemv code is here: https://github.com/intel/neural-speed/pull/209/files#diff-3f2e40e478bc4fdc338616cf4c43969cdd035cc17df448ba42bc2277f628a52dR1329

sgemv code is not planned yet, it's slower than igemv.

chenhongyu2048 · 2024-04-18T07:34:23Z

thank u again! I am going to study these code.

kevinintel assigned luoyu-intel Apr 17, 2024

chenhongyu2048 closed this as completed Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Thread pool and GEMV #221

Question about Thread pool and GEMV #221

chenhongyu2048 commented Apr 16, 2024

luoyu-intel commented Apr 17, 2024

chenhongyu2048 commented Apr 17, 2024

luoyu-intel commented Apr 17, 2024

chenhongyu2048 commented Apr 18, 2024

Question about Thread pool and GEMV #221

Question about Thread pool and GEMV #221

Comments

chenhongyu2048 commented Apr 16, 2024

luoyu-intel commented Apr 17, 2024

chenhongyu2048 commented Apr 17, 2024

luoyu-intel commented Apr 17, 2024

chenhongyu2048 commented Apr 18, 2024