Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix errors when using smoothquant to quantize Qwen2 model #2370

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Missmiaom
Copy link

When quantizing the Qwen2 model with SmoothQuant, the mlp.proj was not correctly split; it should be split along the 0th dimension.

command:

python3 convert_checkpoint.py --model_dir ./tmp/Qwen/7B/ --output_dir ./tllm_checkpoint_1gpu_sq --dtype float16 --smoothquant 0.5
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_sq \
             --output_dir ./engine_outputs \
             --gemm_plugin float16

When quantizing the Qwen2 model with SmoothQuant, the mlp.proj was not correctly split; it should be split along the 0th dimension.

command:
```
python3 convert_checkpoint.py --model_dir ./tmp/Qwen/7B/ --output_dir ./tllm_checkpoint_1gpu_sq --dtype float16 --smoothquant 0.5
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_sq \
             --output_dir ./engine_outputs \
             --gemm_plugin float16
```
@hello-11 hello-11 added the Low Precision Issue about lower bit quantization, including int8, int4, fp8 label Oct 25, 2024
@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Oct 30, 2024
@jershi425
Copy link
Collaborator

Hi @Missmiaom, thank you for the contribution. We will include this fix in our internal repo and it will be reflected in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants