Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Add quantization methods #17

Open
namtranase opened this issue Feb 23, 2024 · 4 comments
Open

[Feature request] Add quantization methods #17

namtranase opened this issue Feb 23, 2024 · 4 comments
Assignees
Labels
Feature New feature or request stat:awaiting response Status - Awaiting response from author

Comments

@namtranase
Copy link

It would be awesome if the repo supported quantization methods.
Reference: k-quants

@austinvhuang austinvhuang added the Feature New feature or request label Feb 26, 2024
@chenxiaoyu3
Copy link

waiting for quantization model +1.

@austinvhuang
Copy link
Collaborator

Understood, the -sfp models are 8 bit weights, but I understand people are interested in more aggressive quantization.

BTW for just decreasing the memory footprint there was a commit that makes the kv cache preallocation smaller + configurable 129e66a - but I get aggressive quantization benefits go beyond that.

Working on a list of priorities + call-for-contributions, will post more soon.

@jan-wassenberg
Copy link
Member

FYI we do support an experimental 4.5 bit quantization method (NUQ), but those weights are not available on Kaggle.
We can more easily support this once we are able to ingest other weight formats (#11).

@KumarGitesh2024 KumarGitesh2024 self-assigned this Jun 10, 2024
@jan-wassenberg
Copy link
Member

An update on this, we do have the ability to import from pytorch weights. Work is still ongoing on evaluating the nonuniform 4.5-bit format.

I'm increasingly concerned about uniform integer quantization in the style of k quants. Recent work such as https://arxiv.org/pdf/2407.03211 points out that human raters detect much more harm than automated metrics, especially in non-English languages, even for int8. Another paper also reports concerns after human evals, apparently also with int8.

@KumarGitesh2024 KumarGitesh2024 added the stat:awaiting response Status - Awaiting response from author label Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature or request stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests

5 participants