This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BesTLA] Refactor quantization-related kernels (#209)
* add instrinsic for s4_clip * remove unstable time data * add q4fp32 kernel * add avx2 version of u8s8 * add hybrid support * use hybrid scheduler for ggml and fp32 kernels. * fix typo * fix err * debug overflow of non-vnni instruction * add ref for gemv * pass UT * add S8S8S32 and S8S8Fp32 for AVX_VNNI * add benchmark S8S8 is 50% of U8S8 * add benchmark and model test * model check * disable dynamic PE ratio * add s8s8 code and benchmark case * add s3 weight ref * add avx2 s3 gemv * add s3 benchmark * use JIT kernel * compile on gcc * add blend filter for dynamic PE * for more stable result * use optmized threading as default * speed up int3 gemv * clang-format * remove alignas * use multi-threading for ROPE * add int2 gemv ref; add avx2 unpack 2bit * add avx2 gemv for int2 * add blocksize=16 case * use vec register instead of general register * protect asym weight * fix s8s8 avx_vnni code * add zero point support for MatB * add manual unroll for sgemv * nbits qunatization from high bits to low bits. support asym quant of s2 * add gemv ut for all 4bit and 2bit functions * complete avx2: s4->s8 packrow=1,2,4; s4->fp(f32,bf16) packrow=1,2,4 * add bf16 UTs * test all AVX2 int4: sym&asym, comp_fp32&comp_int8, packrow=1,2,4 * split s4_fp code * add AVX2 sgemv * test all 4bit AVX2 combinations * support MTILE for int4 gemv * support MTILE for int2 gemv * fix perf of comp_fp32 * add ref of gemv_2bit_fp32_fp32, add new 3bit kernel * complete int2 decompress kernels * sync 2bit and 4bit unpack functions * finish all int3 gemv kernels * add s8s8 3bit gemv UT * fully test int3 weight: group=32,128, comp=fp32,int8 wrapper:gemm and gemv * add DecompressKBlockS3Fp avx2 * test int2 and in3 with comp_fp32 and comp_int8 * speedup int3 int2 with comp_fp32 * fix debug code * prevent compiling from unsupported template * enable asym quantization. test llama2 model for group=32, weight_dtype=int2, int3, int4, alg=asym, compute_dtype=int8 * remove LauncherKBlock * remove all LauncherKBlock * sync s8 weight's compression and decompression functions * remove gemv_s* functions * add AVX2_VNNI * add kblock avx2vnni * test all gemm cases with AVX2_VNNI * fix the correct ISA * add AVX2_VNNI to gemv dispatcher * remove code * benchmark int4 with AVX2_VNNI * support AVX2_VNNI for model quantization and inference * add avx512 s4s8 * add s8fp * pass avx512+int4 UTs * fix compile errors with GCC * remove deprecated functions and wrappers * remove unused UT case * enable gemv for amx_int8 * remove static_assert * disable blocksize=32 for amx_int8 * add avx512: s2s8, s2fp * full test of all avx512 int2 * add avx512 s2 benchmark * support avx512 int2 * fix compile error * add 3bit sgemv avx512 * enable all UTs * fix UT error * pop vnni flags * use correct avx2 epi32_epi16 * use omp and std at the same time * clang-format * check compiler before enable gemv * correct assert condition * help compiler a little * set VS2022 * fix condition * clang-format * compiled with dpcpp * optimize on 1185g7 * add avx2_vnni for int3&int2 * clang-format * fix code errors * compile with gcc9 * revert rope parallel * refactor quantization data process in python * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mtile for epilogue in gemv * clang-format * Revert "revert rope parallel" This reverts commit 7dc4dd8. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information