Does torchao supoort VLM quantization and dequantization？ #1464

yangd85 · 2024-12-30T01:09:04Z

I've been trying to quantize VLM such as Qwen2-VL series or InternVL series using torchao, but I haven't succeeded yet. So does torchao support VLM quantization and dequantization and how to reason about a quantized VLM model？

pawarmanasi07 · 2024-12-30T18:07:38Z

Hi, I'm interested in the VLM quantization support.

As a beginner, I'd love to contribute to improving VLM support - perhaps starting with documenting current challenges and limitations? I can share my experiences trying to quantize these models, which might help identify common issues other users might face.
Would this be helpful?

Happy to start with smaller tasks to learn and gradually work up to more complex contributions.

supriyar · 2024-12-30T19:54:47Z

@jerryzh168 @mobicham can you share your experiences here on quantizing VLM model with torchao? What are some known issues/gaps?

mobicham · 2024-12-31T09:53:12Z

@jerryzh168 @mobicham can you share your experiences here on quantizing VLM model with torchao? What are some known issues/gaps?

Quantizing VLMs simply requires quantizing the language model and it works the same way as a regular language model. If it's a Hugging Face model, you can access it via model.language_model or something similar. No need to quantize the vision model since it's only used once in the prefill and it's relatively small. Qwen2-VL should work out-of-the-box with this approach.

Some more advanced models like Aria are a bit more complicated because they have MoEs, we had to write some custom layers for it.

yangd85 · 2025-01-02T08:09:50Z

Hi, I'm interested in the VLM quantization support.

As a beginner, I'd love to contribute to improving VLM support - perhaps starting with documenting current challenges and limitations? I can share my experiences trying to quantize these models, which might help identify common issues other users might face. Would this be helpful?

Happy to start with smaller tasks to learn and gradually work up to more complex contributions.

Thank you for your kindness, I'd like to ask you a question: "after quantizing a VLM model by torchao, how to reason this quantized VLM model?"

pawarmanasi07 · 2025-01-02T10:34:23Z

To reason about a quantized VLM model, we could evaluate it in a few ways:

Compare performance metrics before and after quantization:
Test accuracy on standard vision-language tasks, measure inference speed and check model size reduction.

Run practical tests:
Try various types of input images, test different text prompts, and verify performance on your specific use cases.

Check hardware utilization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does torchao supoort VLM quantization and dequantization？ #1464

Does torchao supoort VLM quantization and dequantization？ #1464

yangd85 commented Dec 30, 2024

pawarmanasi07 commented Dec 30, 2024

supriyar commented Dec 30, 2024

mobicham commented Dec 31, 2024

yangd85 commented Jan 2, 2025

pawarmanasi07 commented Jan 2, 2025

Does torchao supoort VLM quantization and dequantization？ #1464

Does torchao supoort VLM quantization and dequantization？ #1464

Comments

yangd85 commented Dec 30, 2024

pawarmanasi07 commented Dec 30, 2024

supriyar commented Dec 30, 2024

mobicham commented Dec 31, 2024

yangd85 commented Jan 2, 2025

pawarmanasi07 commented Jan 2, 2025