You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when I compile the pte file from llama-7b-chat as indicated by "https://pytorch.org/executorch/stable/build-run-vulkan.html", I find that the generated ptr file size is too big to run on the edge devices. So I tried to use quantization to the Vulkan bankend. the command is as follows: python -m examples.models.llama2.export_llama --disable_dynamic_shape --vulkan -kv --use_sdpa_with_kv_cache --checkpoint ~/Llama-2-7b-chat/consolidated.00.pth --params ~/Llama-2-7b-chat/params.json -d fp32 -X -qmode 8da4w --group_size 128 --max_seq_length 1024
then it was generated the error as follows: Vulkan backend does not support quantization at the moment
I tried to generate the pte file at executor v0.4.0
Conld you help me fix the issue?
Thank you in advanced
Versions
executorvh v0.4.0
The text was updated successfully, but these errors were encountered:
🐛 Describe the bug
when I compile the pte file from llama-7b-chat as indicated by "https://pytorch.org/executorch/stable/build-run-vulkan.html", I find that the generated ptr file size is too big to run on the edge devices. So I tried to use quantization to the Vulkan bankend. the command is as follows:
python -m examples.models.llama2.export_llama --disable_dynamic_shape --vulkan -kv --use_sdpa_with_kv_cache --checkpoint ~/Llama-2-7b-chat/consolidated.00.pth --params ~/Llama-2-7b-chat/params.json -d fp32 -X -qmode 8da4w --group_size 128 --max_seq_length 1024
then it was generated the error as follows:
Vulkan backend does not support quantization at the moment
I tried to generate the pte file at executor v0.4.0
Conld you help me fix the issue?
Thank you in advanced
Versions
executorvh v0.4.0
The text was updated successfully, but these errors were encountered: