Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Question about Thread pool and GEMV #221

Closed
chenhongyu2048 opened this issue Apr 16, 2024 · 4 comments
Closed

Question about Thread pool and GEMV #221

chenhongyu2048 opened this issue Apr 16, 2024 · 4 comments
Assignees

Comments

@chenhongyu2048
Copy link

Because I've been working on efficient GeMV multiplication on CPUs lately, I've found that I'm only going to be able to make a limited amount of improvement after adopting SIMD. Referring to your BesTLA library might inspire me, I'm really looking forward to BestLA's GeMV kernels.
Also, I have a question about BesTLA's thread pool, is it based on a custom thread pool, or is it based on OpenMP?

btw, I'm looking forward to seeing BesTLA become more widely used, or be called separately, just like OpenBLAS.

@luoyu-intel
Copy link
Contributor

Referring to your BesTLA library might inspire me, I'm really looking forward to BestLA's GeMV kernels.

I've done some AVX2 GEMV kernels in this PR: #209. And it does show good performance on next-token inference.

Also, I have a question about BesTLA's thread pool, is it based on a custom thread pool, or is it based on OpenMP?

We've provided a public interface class of threading, which can be implemented by std::thread, OpenMP, or other thread pools like ONNXRuntime's. It's very recommended to use our thread pool on Intel client CPUs which is hybrid. It will be much faster than other thread pools.

btw, I'm looking forward to seeing BesTLA become more widely used, or be called separately, just like OpenBLAS.

We've provided all kinds of GEMMs: sgemm, igemm, hgemm and bf16gemm. Also their Weight-Only-Quantization versions: int3, int4, fp4, and other data types. You can use it as a BLAS library. You can refer ONNXRuntime's code for how to use BesTLA only in your project: https://github.com/microsoft/onnxruntime/blob/main/cmake/external/neural_speed.cmake

@chenhongyu2048
Copy link
Author

Thank you for your prompt reply!
In PR#209, I found that modification add avx2 sgemv and igemv kernels has been checked. But I can't find the exact location of the functions, I only find 3/4 bit version in bestla/bestla/kernel_avx2.h. Can you tell me where sgemv is implemented?

@luoyu-intel
Copy link
Contributor

igemv code is here: https://github.com/intel/neural-speed/pull/209/files#diff-3f2e40e478bc4fdc338616cf4c43969cdd035cc17df448ba42bc2277f628a52dR1329

sgemv code is not planned yet, it's slower than igemv.

@chenhongyu2048
Copy link
Author

thank u again! I am going to study these code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants