[Bug] The quantized weight type will be overridden by the activation type if per_channel is True #20881

duanshengliu · 2024-05-31T14:10:55Z

Describe the issue

I am using quantize_static for quantization, and I found that if per_channel=True, the type of weights is always the same as the activation type. This means that when the user specifies different types for weights and activations, the quantized models produced are not as expected. Therefore, I believe this may be a bug, and I have located the relevant lines of code as follows:
qtype = self.activation_qType

Maybe we can fix this issue by replacing self.activation_qType with self.weight_qType, like this: qtype = self.weight_qType,
then delete L484 and L485 (not sure if these two lines are useful).

To reproduce

The issue can be reproduced by using the relevant files in demo.zip. The reproduction commands are as follows,

python run.py --per_channel --weight_type int8 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/

which will produce a quantized model with 16-bit weights. ❌

python run.py --per_channel --weight_type int16 --activation_type int8 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/

which will produce a quantized model with 8-bit weights.❌

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

duanshengliu · 2024-06-02T14:57:14Z

@fs-eire，could you please help to take a look?

JIAOJINYU · 2024-06-05T01:09:11Z

I also met this issue when performing A16W8 quantization with quantize weights per channel.
I modified the end-to-end example from onnxruntime.
The following script snippet I used explain the A16U8 quantization configuration:

def main():
    args = get_args()
    input_model_path = args.input_model
    output_model_path = args.output_model
    calibration_dataset_path = args.calibrate_dataset
    dr = resnet50_data_reader.ResNet50DataReader(
        calibration_dataset_path, input_model_path
    )

    extra_option = {
        "ActivationSymmetric": True,
        "WeightSymmetric": True,
        'ForceQuantizeNoInputCheck': True,
        "UseQDQContribOps": True 
    }

    quantize_static(
        input_model_path,
        output_model_path,
        dr,
        quant_format=args.quant_format,
        per_channel=args.per_channel,
        activation_type=QuantType.QInt16,
        weight_type=QuantType.QInt8,
        extra_options=extra_option,
        reduce_range=True
    )

I ran the script with the option --per_channel, but the output model still in A16W16, The following screeshot shows the quantized model(the weight of the conv layer still got the 16bit format):

Is the problem is that the parameters are not set correctly when calling quantize_staic?

duanshengliu · 2024-06-05T01:34:10Z

I also met this issue when performing A16W8 quantization with quantize weights per channel. I modified the end-to-end example from onnxruntime. The following script snippet I used explain the A16U8 quantization configuration:

def main():
    args = get_args()
    input_model_path = args.input_model
    output_model_path = args.output_model
    calibration_dataset_path = args.calibrate_dataset
    dr = resnet50_data_reader.ResNet50DataReader(
        calibration_dataset_path, input_model_path
    )

    extra_option = {
        "ActivationSymmetric": True,
        "WeightSymmetric": True,
        'ForceQuantizeNoInputCheck': True,
        "UseQDQContribOps": True 
    }

    quantize_static(
        input_model_path,
        output_model_path,
        dr,
        quant_format=args.quant_format,
        per_channel=args.per_channel,
        activation_type=QuantType.QInt16,
        weight_type=QuantType.QInt8,
        extra_options=extra_option,
        reduce_range=True
    )

I ran the script with the option --per_channel, but the output model still in A16W16, The following screeshot shows the quantized model(the weight of the conv layer still got the 16bit format):

Is the problem is that the parameters are not set correctly when calling quantize_staic?

Hi，I don't think so. Your issue is the same as mine, including the example we used. And I think this issue is caused by L483.

### Description Fix issue #20881. The weight quantization was set to be the activation type.

github-actions bot added the quantization issues related to quantization label May 31, 2024

yufenglee assigned xadupre Jun 5, 2024

xadupre mentioned this issue Jun 6, 2024

Fix wrong quantization type in quantization tool #20954

Merged

yufenglee pushed a commit that referenced this issue Jun 14, 2024

Fix wrong quantization type in quantization tool (#20954)

c66e920

### Description Fix issue #20881. The weight quantization was set to be the activation type.

duanshengliu closed this as completed Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The quantized weight type will be overridden by the activation type if per_channel is True #20881

[Bug] The quantized weight type will be overridden by the activation type if per_channel is True #20881

duanshengliu commented May 31, 2024 •

edited

Loading

duanshengliu commented Jun 2, 2024

JIAOJINYU commented Jun 5, 2024

duanshengliu commented Jun 5, 2024 •

edited

Loading

[Bug] The quantized weight type will be overridden by the activation type if per_channel is True #20881

[Bug] The quantized weight type will be overridden by the activation type if per_channel is True #20881

Comments

duanshengliu commented May 31, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

duanshengliu commented Jun 2, 2024

JIAOJINYU commented Jun 5, 2024

duanshengliu commented Jun 5, 2024 • edited Loading

duanshengliu commented May 31, 2024 •

edited

Loading

duanshengliu commented Jun 5, 2024 •

edited

Loading