[Build] How can I quantize the llama3 model activation to int4 ? #21334

zhangyu68 · 2024-07-12T07:33:57Z

Describe the issue

I’m trying to quantize a int4 model, but this file only provides the weight-only-quantization. If I can quantize both weight and activation to int4 ?
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

Thanks for your help!

Urgency

No response

Target platform

onnx

Build script

python -m onnxruntime.transformers.models.llama.convert_to_onnx -m /publicdata/huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/ --output llama3-8b-int4-gpu --precision int4 --execution_provider cuda --quantization_method blockwise --use_gqa

Error / output

except can quantize both weight and activation

Visual Studio Version

No response

GCC / Compiler Version

No response

github-actions · 2024-08-18T15:00:58Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

zhangyu68 added the build build issues; typically submitted using template label Jul 12, 2024

github-actions bot added ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. quantization issues related to quantization labels Jul 12, 2024

sophies927 removed the ep:CUDA issues related to the CUDA execution provider label Jul 18, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build] How can I quantize the llama3 model activation to int4 ? #21334

[Build] How can I quantize the llama3 model activation to int4 ? #21334

zhangyu68 commented Jul 12, 2024

github-actions bot commented Aug 18, 2024

[Build] How can I quantize the llama3 model activation to int4 ? #21334

[Build] How can I quantize the llama3 model activation to int4 ? #21334

Comments

zhangyu68 commented Jul 12, 2024

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

github-actions bot commented Aug 18, 2024