You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 1, 2023. It is now read-only.
I try to use Post Training Quantization to convert my float32 model to int8 follow the tutorial of quantizing GNMT. I change the model code to a distiller style and get a quantized model. This is some information about the quantized model:
My model is a transformer model from OpenNMT. It seems I do get a correct compressed model but the model file is larger than the original model, like 200MB -> 280 MB. Is there any method can reduce the model size as I think it's the most important feature from quantization.
The code script is like this:
I try to use Post Training Quantization to convert my float32 model to int8 follow the tutorial of quantizing GNMT. I change the model code to a distiller style and get a quantized model. This is some information about the quantized model:
My model is a transformer model from OpenNMT. It seems I do get a correct compressed model but the model file is larger than the original model, like 200MB -> 280 MB. Is there any method can reduce the model size as I think it's the most important feature from quantization.
The code script is like this:
The text was updated successfully, but these errors were encountered: