Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq" #19106

Merged
merged 6 commits into from
Mar 5, 2024

Conversation

wejoncy
Copy link
Contributor

@wejoncy wejoncy commented Jan 12, 2024

Description

  1. Support quantized GPTQ weight in huggingface like TheBloke/Llama-2-7B-Chat-GPTQ
  2. Support Act_order for GPTQ
  3. Support HQQ algorithm to quantize matmul weight and add quant script

Motivation and Context

@wejoncy wejoncy requested a review from yufenglee January 12, 2024 07:26
@wejoncy wejoncy changed the title Jicwen/matmulnbits gptq [quant] matmulnbits support 2-8 bits, act_order, gptq/hqq Jan 12, 2024
@wejoncy wejoncy changed the title [quant] matmulnbits support 2-8 bits, act_order, gptq/hqq [quant] matmulnbits supports 2-8 bits with act_order and new quantization "hqq" Jan 12, 2024
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch from 01bfa24 to 555e34a Compare January 12, 2024 08:25
@wejoncy wejoncy changed the title [quant] matmulnbits supports 2-8 bits with act_order and new quantization "hqq" [quant] supports 2-8 bits kernel with act_order inputs in Op Matmulnbits and new quantization "hqq" Jan 12, 2024
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch 2 times, most recently from ed1cd8c to 1594d7b Compare January 20, 2024 11:45
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch from 6117551 to 2e58ea2 Compare February 7, 2024 05:59
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch from de14fbc to 1a2328e Compare February 23, 2024 05:53
@wejoncy wejoncy changed the title [quant] supports 2-8 bits kernel with act_order inputs in Op Matmulnbits and new quantization "hqq" [quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq" Feb 23, 2024
@wejoncy wejoncy marked this pull request as ready for review February 26, 2024 03:23
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch 2 times, most recently from 8084a75 to 1167ad7 Compare February 26, 2024 08:59
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch from e011c2b to 012227c Compare February 27, 2024 03:09
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch from 012227c to fd51543 Compare February 27, 2024 03:21
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
temp_models/test_llama.py Fixed Show fixed Hide fixed
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch from 0deef36 to 9449f21 Compare February 27, 2024 10:13
@wejoncy wejoncy force-pushed the jicwen/matmulnbits_gptq branch from 9449f21 to 6a3caa6 Compare February 28, 2024 03:06
Copy link
Member

@yufenglee yufenglee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@wejoncy wejoncy merged commit 7e613ee into main Mar 5, 2024
95 checks passed
@wejoncy wejoncy deleted the jicwen/matmulnbits_gptq branch March 5, 2024 03:45
zz002 pushed a commit to zz002/onnxruntime that referenced this pull request Mar 7, 2024
… algorithm "hqq" (microsoft#19106)

### Description
<!-- Describe your changes. -->
1. Support quantized GPTQ weight in huggingface like
[TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)
2. Support Act_order for GPTQ
3. Support [HQQ](https://mobiusml.github.io/hqq_blog/) algorithm to
quantize matmul weight and add quant script



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
adrianlizarraga added a commit that referenced this pull request Mar 29, 2024
…20146)

### Description
Fixes code that extracts the accuracy level when creating a MatMulNBits
node in the `DefaultWeightOnlyQuantizer` class.


### Motivation and Context
Error from line 443: `AttributeError: 'DefaultWeightOnlyQuantizer'
object has no attribute 'accuracy_level'`. The solution is to access
`self.config.accuracy_level` instead of `self.accuracy_level`.

Relevant commit: #19106
TedThemistokleous pushed a commit to TedThemistokleous/onnxruntime that referenced this pull request May 7, 2024
…icrosoft#20146)

### Description
Fixes code that extracts the accuracy level when creating a MatMulNBits
node in the `DefaultWeightOnlyQuantizer` class.


### Motivation and Context
Error from line 443: `AttributeError: 'DefaultWeightOnlyQuantizer'
object has no attribute 'accuracy_level'`. The solution is to access
`self.config.accuracy_level` instead of `self.accuracy_level`.

Relevant commit: microsoft#19106
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants