Fix errors when using smoothquant to quantize Qwen2 model #2370

Missmiaom · 2024-10-24T11:22:57Z

When quantizing the Qwen2 model with SmoothQuant, the mlp.proj was not correctly split; it should be split along the 0th dimension.

command:

python3 convert_checkpoint.py --model_dir ./tmp/Qwen/7B/ --output_dir ./tllm_checkpoint_1gpu_sq --dtype float16 --smoothquant 0.5
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_sq \
             --output_dir ./engine_outputs \
             --gemm_plugin float16

When quantizing the Qwen2 model with SmoothQuant, the mlp.proj was not correctly split; it should be split along the 0th dimension. command: ``` python3 convert_checkpoint.py --model_dir ./tmp/Qwen/7B/ --output_dir ./tllm_checkpoint_1gpu_sq --dtype float16 --smoothquant 0.5 trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_sq \ --output_dir ./engine_outputs \ --gemm_plugin float16 ```

jershi425 · 2024-11-05T13:11:59Z

Hi @Missmiaom, thank you for the contribution. We will include this fix in our internal repo and it will be reflected in the next release.

hello-11 added the Low Precision Issue about lower bit quantization, including int8, int4, fp8 label Oct 25, 2024

Merge branch 'NVIDIA:main' into main

551b928

hello-11 added the triaged Issue has been triaged by maintainers label Oct 30, 2024

jershi425 mentioned this pull request Nov 5, 2024

Error happened when quantizate Qwen2.5-14B-Instruct by SmoothQuant #2319

Open

kaiyux mentioned this pull request Nov 12, 2024

Update TensorRT-LLM #2436

Merged

Shixiaowei02 mentioned this pull request Dec 4, 2024

TensorRT-LLM Release 0.15.0 #2529

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix errors when using smoothquant to quantize Qwen2 model #2370

Fix errors when using smoothquant to quantize Qwen2 model #2370

Missmiaom commented Oct 24, 2024

jershi425 commented Nov 5, 2024

Fix errors when using smoothquant to quantize Qwen2 model #2370

Are you sure you want to change the base?

Fix errors when using smoothquant to quantize Qwen2 model #2370

Conversation

Missmiaom commented Oct 24, 2024

jershi425 commented Nov 5, 2024