compile error"Vulkan backend does not support quantization at the moment" #7132

l2002924700 · 2024-12-01T09:05:51Z

🐛 Describe the bug

when I compile the pte file from llama-7b-chat as indicated by "https://pytorch.org/executorch/stable/build-run-vulkan.html", I find that the generated ptr file size is too big to run on the edge devices. So I tried to use quantization to the Vulkan bankend. the command is as follows:
python -m examples.models.llama2.export_llama --disable_dynamic_shape --vulkan -kv --use_sdpa_with_kv_cache --checkpoint ~/Llama-2-7b-chat/consolidated.00.pth --params ~/Llama-2-7b-chat/params.json -d fp32 -X -qmode 8da4w --group_size 128 --max_seq_length 1024
then it was generated the error as follows:
Vulkan backend does not support quantization at the moment
I tried to generate the pte file at executor v0.4.0
Conld you help me fix the issue?
Thank you in advanced

Versions

executorvh v0.4.0

The text was updated successfully, but these errors were encountered:

JacobSzwejbka · 2024-12-02T18:26:50Z

@jorgep31415 @SS-JIA

pytorch-bot bot added the module: vulkan label Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compile error"Vulkan backend does not support quantization at the moment" #7132

compile error"Vulkan backend does not support quantization at the moment" #7132

l2002924700 commented Dec 1, 2024

JacobSzwejbka commented Dec 2, 2024

compile error"Vulkan backend does not support quantization at the moment" #7132

compile error"Vulkan backend does not support quantization at the moment" #7132

Comments

l2002924700 commented Dec 1, 2024

🐛 Describe the bug

Versions

JacobSzwejbka commented Dec 2, 2024