Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] How can I quantize the llama3 model activation to int4 ? #21334

Open
zhangyu68 opened this issue Jul 12, 2024 · 1 comment
Open

[Build] How can I quantize the llama3 model activation to int4 ? #21334

zhangyu68 opened this issue Jul 12, 2024 · 1 comment
Labels
build build issues; typically submitted using template model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot

Comments

@zhangyu68
Copy link

Describe the issue

I’m trying to quantize a int4 model, but this file only provides the weight-only-quantization. If I can quantize both weight and activation to int4 ?
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

Thanks for your help!

Urgency

No response

Target platform

onnx

Build script

python -m onnxruntime.transformers.models.llama.convert_to_onnx -m /publicdata/huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/ --output llama3-8b-int4-gpu --precision int4 --execution_provider cuda --quantization_method blockwise --use_gqa

Error / output

except can quantize both weight and activation

Visual Studio Version

No response

GCC / Compiler Version

No response

@zhangyu68 zhangyu68 added the build build issues; typically submitted using template label Jul 12, 2024
@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. quantization issues related to quantization labels Jul 12, 2024
@sophies927 sophies927 removed the ep:CUDA issues related to the CUDA execution provider label Jul 18, 2024
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants