[Feature request] Add quantization methods #17

namtranase · 2024-02-23T09:11:38Z

It would be awesome if the repo supported quantization methods.
Reference: k-quants

chenxiaoyu3 · 2024-02-27T12:15:56Z

waiting for quantization model +1.

austinvhuang · 2024-02-27T12:28:07Z

Understood, the -sfp models are 8 bit weights, but I understand people are interested in more aggressive quantization.

BTW for just decreasing the memory footprint there was a commit that makes the kv cache preallocation smaller + configurable 129e66a - but I get aggressive quantization benefits go beyond that.

Working on a list of priorities + call-for-contributions, will post more soon.

jan-wassenberg · 2024-02-28T02:21:33Z

FYI we do support an experimental 4.5 bit quantization method (NUQ), but those weights are not available on Kaggle.
We can more easily support this once we are able to ingest other weight formats (#11).

jan-wassenberg · 2024-07-15T10:13:50Z

An update on this, we do have the ability to import from pytorch weights. Work is still ongoing on evaluating the nonuniform 4.5-bit format.

I'm increasingly concerned about uniform integer quantization in the style of k quants. Recent work such as https://arxiv.org/pdf/2407.03211 points out that human raters detect much more harm than automated metrics, especially in non-English languages, even for int8. Another paper also reports concerns after human evals, apparently also with int8.

austinvhuang added the Feature New feature or request label Feb 26, 2024

KumarGitesh2024 self-assigned this Jun 10, 2024

KumarGitesh2024 added the stat:awaiting response Status - Awaiting response from author label Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Add quantization methods #17

[Feature request] Add quantization methods #17

namtranase commented Feb 23, 2024

chenxiaoyu3 commented Feb 27, 2024

austinvhuang commented Feb 27, 2024

jan-wassenberg commented Feb 28, 2024

jan-wassenberg commented Jul 15, 2024

[Feature request] Add quantization methods #17

[Feature request] Add quantization methods #17

Comments

namtranase commented Feb 23, 2024

chenxiaoyu3 commented Feb 27, 2024

austinvhuang commented Feb 27, 2024

jan-wassenberg commented Feb 28, 2024

jan-wassenberg commented Jul 15, 2024