Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activation quantization support #194

Open
ayyoobimani opened this issue Aug 12, 2024 · 1 comment
Open

Activation quantization support #194

ayyoobimani opened this issue Aug 12, 2024 · 1 comment

Comments

@ayyoobimani
Copy link

Many papers have recently addressed the issues with quantization of activations for LLMs.

Examples:
https://github.com/ziplab/QLLM?tab=readme-ov-file#%F0%9F%9B%A0-install
https://github.com/mit-han-lab/lmquant?tab=readme-ov-file#efficiency-benchmarks
https://github.com/spcl/QuaRot

Is it possible to add activation quantization support to gpt-fast for even more speedup?

Any insight on the limitations and possibilities is appreciated.

@yanboliang
Copy link
Contributor

Activation quantization is another different quantization technique compared with weight quantization. It's dynamically during interference which is different from weight quantization (static after training). I don't think it would help to address the major performance bottleneck of LLM inference, so we didn't add it. But we encourage users to copy-paste, fork, and play around the repo with new ideas, you can try it if you are interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants