Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does torchao supoort VLM quantization and dequantization? #1464

Open
yangd85 opened this issue Dec 30, 2024 · 5 comments
Open

Does torchao supoort VLM quantization and dequantization? #1464

yangd85 opened this issue Dec 30, 2024 · 5 comments

Comments

@yangd85
Copy link

yangd85 commented Dec 30, 2024

I've been trying to quantize VLM such as Qwen2-VL series or InternVL series using torchao, but I haven't succeeded yet. So does torchao support VLM quantization and dequantization and how to reason about a quantized VLM model?

@pawarmanasi07
Copy link

Hi, I'm interested in the VLM quantization support.

As a beginner, I'd love to contribute to improving VLM support - perhaps starting with documenting current challenges and limitations? I can share my experiences trying to quantize these models, which might help identify common issues other users might face.
Would this be helpful?

Happy to start with smaller tasks to learn and gradually work up to more complex contributions.

@supriyar
Copy link
Contributor

@jerryzh168 @mobicham can you share your experiences here on quantizing VLM model with torchao? What are some known issues/gaps?

@mobicham
Copy link
Collaborator

@jerryzh168 @mobicham can you share your experiences here on quantizing VLM model with torchao? What are some known issues/gaps?

Quantizing VLMs simply requires quantizing the language model and it works the same way as a regular language model. If it's a Hugging Face model, you can access it via model.language_model or something similar. No need to quantize the vision model since it's only used once in the prefill and it's relatively small. Qwen2-VL should work out-of-the-box with this approach.

Some more advanced models like Aria are a bit more complicated because they have MoEs, we had to write some custom layers for it.

@yangd85
Copy link
Author

yangd85 commented Jan 2, 2025

Hi, I'm interested in the VLM quantization support.

As a beginner, I'd love to contribute to improving VLM support - perhaps starting with documenting current challenges and limitations? I can share my experiences trying to quantize these models, which might help identify common issues other users might face. Would this be helpful?

Happy to start with smaller tasks to learn and gradually work up to more complex contributions.

Thank you for your kindness, I'd like to ask you a question: "after quantizing a VLM model by torchao, how to reason this quantized VLM model?"

@pawarmanasi07
Copy link

To reason about a quantized VLM model, we could evaluate it in a few ways:

Compare performance metrics before and after quantization:
Test accuracy on standard vision-language tasks, measure inference speed and check model size reduction.

Run practical tests:
Try various types of input images, test different text prompts, and verify performance on your specific use cases.

Check hardware utilization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants