-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] The quantized weight type will be overridden by the activation type if per_channel is True #20881
Comments
@fs-eire,could you please help to take a look? |
I also met this issue when performing A16W8 quantization with quantize weights per channel. def main():
args = get_args()
input_model_path = args.input_model
output_model_path = args.output_model
calibration_dataset_path = args.calibrate_dataset
dr = resnet50_data_reader.ResNet50DataReader(
calibration_dataset_path, input_model_path
)
extra_option = {
"ActivationSymmetric": True,
"WeightSymmetric": True,
'ForceQuantizeNoInputCheck': True,
"UseQDQContribOps": True
}
quantize_static(
input_model_path,
output_model_path,
dr,
quant_format=args.quant_format,
per_channel=args.per_channel,
activation_type=QuantType.QInt16,
weight_type=QuantType.QInt8,
extra_options=extra_option,
reduce_range=True
) I ran the script with the option --per_channel, but the output model still in A16W16, The following screeshot shows the quantized model(the weight of the conv layer still got the 16bit format): Is the problem is that the parameters are not set correctly when calling quantize_staic? |
Hi,I don't think so. Your issue is the same as mine, including the example we used. And I think this issue is caused by L483. |
### Description Fix issue #20881. The weight quantization was set to be the activation type.
Describe the issue
I am using
quantize_static
for quantization, and I found that ifper_channel=True
, the type of weights is always the same as the activation type. This means that when the user specifies different types for weights and activations, the quantized models produced are not as expected. Therefore, I believe this may be a bug, and I have located the relevant lines of code as follows:qtype = self.activation_qType
Maybe we can fix this issue by replacing
self.activation_qType
withself.weight_qType
, like this:qtype = self.weight_qType
,then delete L484 and L485 (not sure if these two lines are useful).
To reproduce
The issue can be reproduced by using the relevant files in demo.zip. The reproduction commands are as follows,
python run.py --per_channel --weight_type int8 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/
which will produce a quantized model with 16-bit weights. ❌
python run.py --per_channel --weight_type int16 --activation_type int8 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/
which will produce a quantized model with 8-bit weights.❌
Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: