Add documentation about exporting pte on Qualcomm backend without quantization #7133
Labels
actionable
Items in the backlog waiting for an appropriate impl/fix
enhancement
Not as big of a feature, but technically not a bug. Should be easy to fix
module: qnn
Related to Qualcomm's QNN delegate
partner: qualcomm
For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
The documentation for the Qualcomm backend (found here) only describes exporting to a .pte file with quantization.
Is there a way to use prequantized models provided in the Llama3 repository? For example, the XNNPACK backend supports using prequantized checkpoints (SpinQuant) from Hugging Face.
The text was updated successfully, but these errors were encountered: