Activation quantization support #194

ayyoobimani · 2024-08-12T16:43:21Z

Many papers have recently addressed the issues with quantization of activations for LLMs.

Examples:
https://github.com/ziplab/QLLM?tab=readme-ov-file#%F0%9F%9B%A0-install
https://github.com/mit-han-lab/lmquant?tab=readme-ov-file#efficiency-benchmarks
https://github.com/spcl/QuaRot

Is it possible to add activation quantization support to gpt-fast for even more speedup?

Any insight on the limitations and possibilities is appreciated.

yanboliang · 2024-09-16T04:57:50Z

Activation quantization is another different quantization technique compared with weight quantization. It's dynamically during interference which is different from weight quantization (static after training). I don't think it would help to address the major performance bottleneck of LLM inference, so we didn't add it. But we encourage users to copy-paste, fork, and play around the repo with new ideas, you can try it if you are interested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation quantization support #194

Activation quantization support #194

ayyoobimani commented Aug 12, 2024

yanboliang commented Sep 16, 2024

Activation quantization support #194

Activation quantization support #194

Comments

ayyoobimani commented Aug 12, 2024

yanboliang commented Sep 16, 2024