-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does torchao supoort VLM quantization and dequantization? #1464
Comments
Hi, I'm interested in the VLM quantization support. As a beginner, I'd love to contribute to improving VLM support - perhaps starting with documenting current challenges and limitations? I can share my experiences trying to quantize these models, which might help identify common issues other users might face. Happy to start with smaller tasks to learn and gradually work up to more complex contributions. |
@jerryzh168 @mobicham can you share your experiences here on quantizing VLM model with torchao? What are some known issues/gaps? |
Quantizing VLMs simply requires quantizing the language model and it works the same way as a regular language model. If it's a Hugging Face model, you can access it via Some more advanced models like Aria are a bit more complicated because they have MoEs, we had to write some custom layers for it. |
Thank you for your kindness, I'd like to ask you a question: "after quantizing a VLM model by torchao, how to reason this quantized VLM model?" |
To reason about a quantized VLM model, we could evaluate it in a few ways: Compare performance metrics before and after quantization: Run practical tests: Check hardware utilization. |
I've been trying to quantize VLM such as Qwen2-VL series or InternVL series using torchao, but I haven't succeeded yet. So does torchao support VLM quantization and dequantization and how to reason about a quantized VLM model?
The text was updated successfully, but these errors were encountered: