Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile error"Vulkan backend does not support quantization at the moment" #7132

Open
l2002924700 opened this issue Dec 1, 2024 · 1 comment

Comments

@l2002924700
Copy link

🐛 Describe the bug

when I compile the pte file from llama-7b-chat as indicated by "https://pytorch.org/executorch/stable/build-run-vulkan.html", I find that the generated ptr file size is too big to run on the edge devices. So I tried to use quantization to the Vulkan bankend. the command is as follows:
python -m examples.models.llama2.export_llama --disable_dynamic_shape --vulkan -kv --use_sdpa_with_kv_cache --checkpoint ~/Llama-2-7b-chat/consolidated.00.pth --params ~/Llama-2-7b-chat/params.json -d fp32 -X -qmode 8da4w --group_size 128 --max_seq_length 1024
then it was generated the error as follows:
Vulkan backend does not support quantization at the moment
I tried to generate the pte file at executor v0.4.0
Conld you help me fix the issue?
Thank you in advanced

Versions

executorvh v0.4.0

@JacobSzwejbka
Copy link
Contributor

@jorgep31415 @SS-JIA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants