Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The quantized weight type will be overridden by the activation type if per_channel is True #20881

Closed
duanshengliu opened this issue May 31, 2024 · 3 comments
Assignees
Labels
quantization issues related to quantization

Comments

@duanshengliu
Copy link
Contributor

duanshengliu commented May 31, 2024

Describe the issue

I am using quantize_static for quantization, and I found that if per_channel=True, the type of weights is always the same as the activation type. This means that when the user specifies different types for weights and activations, the quantized models produced are not as expected. Therefore, I believe this may be a bug, and I have located the relevant lines of code as follows:
qtype = self.activation_qType

Maybe we can fix this issue by replacing self.activation_qType with self.weight_qType, like this: qtype = self.weight_qType,
then delete L484 and L485 (not sure if these two lines are useful).

To reproduce

The issue can be reproduced by using the relevant files in demo.zip. The reproduction commands are as follows,

python run.py --per_channel --weight_type int8 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/

which will produce a quantized model with 16-bit weights. ❌

python run.py --per_channel --weight_type int16 --activation_type int8 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/

which will produce a quantized model with 8-bit weights.❌

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the quantization issues related to quantization label May 31, 2024
@duanshengliu
Copy link
Contributor Author

@fs-eire,could you please help to take a look?

@JIAOJINYU
Copy link

I also met this issue when performing A16W8 quantization with quantize weights per channel.
I modified the end-to-end example from onnxruntime.
The following script snippet I used explain the A16U8 quantization configuration:

def main():
    args = get_args()
    input_model_path = args.input_model
    output_model_path = args.output_model
    calibration_dataset_path = args.calibrate_dataset
    dr = resnet50_data_reader.ResNet50DataReader(
        calibration_dataset_path, input_model_path
    )

    extra_option = {
        "ActivationSymmetric": True,
        "WeightSymmetric": True,
        'ForceQuantizeNoInputCheck': True,
        "UseQDQContribOps": True 
    }

    quantize_static(
        input_model_path,
        output_model_path,
        dr,
        quant_format=args.quant_format,
        per_channel=args.per_channel,
        activation_type=QuantType.QInt16,
        weight_type=QuantType.QInt8,
        extra_options=extra_option,
        reduce_range=True
    )

I ran the script with the option --per_channel, but the output model still in A16W16, The following screeshot shows the quantized model(the weight of the conv layer still got the 16bit format):
A16W8_pro1

Is the problem is that the parameters are not set correctly when calling quantize_staic?

@duanshengliu
Copy link
Contributor Author

duanshengliu commented Jun 5, 2024

I also met this issue when performing A16W8 quantization with quantize weights per channel. I modified the end-to-end example from onnxruntime. The following script snippet I used explain the A16U8 quantization configuration:

def main():
    args = get_args()
    input_model_path = args.input_model
    output_model_path = args.output_model
    calibration_dataset_path = args.calibrate_dataset
    dr = resnet50_data_reader.ResNet50DataReader(
        calibration_dataset_path, input_model_path
    )

    extra_option = {
        "ActivationSymmetric": True,
        "WeightSymmetric": True,
        'ForceQuantizeNoInputCheck': True,
        "UseQDQContribOps": True 
    }

    quantize_static(
        input_model_path,
        output_model_path,
        dr,
        quant_format=args.quant_format,
        per_channel=args.per_channel,
        activation_type=QuantType.QInt16,
        weight_type=QuantType.QInt8,
        extra_options=extra_option,
        reduce_range=True
    )

I ran the script with the option --per_channel, but the output model still in A16W16, The following screeshot shows the quantized model(the weight of the conv layer still got the 16bit format): A16W8_pro1

Is the problem is that the parameters are not set correctly when calling quantize_staic?

Hi,I don't think so. Your issue is the same as mine, including the example we used. And I think this issue is caused by L483.

yufenglee pushed a commit that referenced this issue Jun 14, 2024
### Description
Fix issue #20881. The weight quantization was set to be the activation
type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization
Projects
None yet
Development

No branches or pull requests

3 participants