Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onnxruntime shape mismatch during quantization of yolov8 models #21048

Open
Jamil opened this issue Jun 14, 2024 · 7 comments
Open

onnxruntime shape mismatch during quantization of yolov8 models #21048

Jamil opened this issue Jun 14, 2024 · 7 comments
Labels
quantization issues related to quantization

Comments

@Jamil
Copy link

Jamil commented Jun 14, 2024

Describe the issue

When trying to quantize a Yolov8 model (exported with yolo export model=yolov8x.pt format=onnx) with onnxruntime, I get the following error:

$ python quantize.py yolov8x.onnx
Model changed? False
Model to quantize: ./yolov8x.onnx
Exclude nodes:
[]
WARNING:root:Please consider to run pre-processing before quantization. Refer to example: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/cpu/ReadMe.md

WARNING:root:Please consider pre-processing before quantization. See https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/cpu/ReadMe.md
Finished quantization. Validating...
(1, 3, 640, 640)
2024-06-14 10:10:32.328524183 [W:onnxruntime:, execution_frame.cc:660 AllocateMLValueTensorPreAllocateBuffer] Shape mismatch attempting to re-use buffer. {1,40,40,640} != {1,39,39,642}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.
2024-06-14 10:10:32.328606668 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running QLinearConcat node. Name:'/model.11/Concat' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 1 has mismatched dimensions of 40 and 39
Traceback (most recent call last):
  File "quantize.py", line 124, in <module>
    quant_outputs = test_model(output_model_path, input_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "quantize.py", line 32, in test_model
    outputs = session.run(None, input_data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/…/anaconda3/envs/yolo/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running QLinearConcat node. Name:'/model.11/Concat' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 1 has mismatched dimensions of 40 and 39

To reproduce

  • Export a YOLO model using yolo export model=yolov8x.pt format=onnx
  • Quantize the ONNX file as follows:
    qnn_config = get_qnn_qdq_config(model_to_quantize,
                                    data_reader,
                                    activation_type=QuantType.QUInt8,
                                    weight_type=QuantType.QUInt8,
                                    per_channel=True,
                                    activation_symmetric=True,
                                    weight_symmetric=True)

    output_model_path = os.path.join(models_directory, 'model.qdq.onnx')
    quantize(model_to_quantize, output_model_path, qnn_config)

Urgency

This is blocking for the project I'm working on, and seems like a regression in onnxruntime functionality.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Other / Unknown

Execution Provider Library Version

QNN

@github-actions github-actions bot added the quantization issues related to quantization label Jun 14, 2024
@yihonglyu
Copy link
Contributor

Can you share the full reproducer that quantize the model so I can reproduce it?

@Jamil
Copy link
Author

Jamil commented Jun 26, 2024

Sure @yihonglyu, here's a minimal example where I was able to reproduce it with:

Data reader:

import numpy as np
import onnxruntime
import os
import random

from PIL import Image
from tqdm import tqdm
from onnxruntime.quantization import CalibrationDataReader

class RandomCalibrationDataReader(CalibrationDataReader):
    def __init__(self, model_path, none1, limit=10):
        self.model_path = model_path
        self.limit = limit
        self.index = 0

        # Initialize ONNX runtime session to get input shape.
        self.session = onnxruntime.InferenceSession(model_path, providers=['CPUExecutionProvider'])
        self.input_shape = self.session.get_inputs()[0].shape
        self.target_size = (640, 640)  # Assuming the target size

        self.datasize = limit

    def get_next(self):
        if self.index < self.datasize:
            self.index += 1
            return {self.session.get_inputs()[0].name: np.random.random(self.input_shape).astype(np.float32)}

    def rewind(self):
        self.index = 0

Quantization:

from ultralytics import YOLO
import sys
import os
import onnxruntime as ort

from onnxruntime.quantization import QuantType, quantize
from onnxruntime.quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model
from onnxruntime.quantization.shape_inference import quant_pre_process

from utils.data_reader import CalibrationDataReader, RandomCalibrationDataReader

model_name = 'yolov8x.pt'
model = YOLO(model_name)
model.export(format='onnx')

input_model_path = model_name.replace('.pt', '.onnx')

# Quantization
data_reader = RandomCalibrationDataReader(input_model_path, '.', limit=200)

preproc_model_path = 'model.preproc.onnx'
quant_pre_process(input_model_path, preproc_model_path, skip_optimization=False)
model_changed = qnn_preprocess_model(preproc_model_path, preproc_model_path)

print(f'Model changed? {model_changed}')
model_to_quantize = preproc_model_path if model_changed else input_model_path
print(f'Model to quantize: {model_to_quantize}')

qnn_config = get_qnn_qdq_config(model_to_quantize,
                                data_reader,
                                activation_type=QuantType.QUInt8,
                                weight_type=QuantType.QUInt8,
                                per_channel=False,
                                activation_symmetric=True,
                                weight_symmetric=True)

output_model_path = 'model.qdq.onnx'
quantize(model_to_quantize, 'model.qdq.onnx', qnn_config)

def test_model(model_path, input_data):
    print(input_data['images'].shape)
    session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
    outputs = session.run(None, input_data)
    return outputs[0]

# Initialize the data reader for the validation dataset
validation_data_reader = CalibrationDataReader(input_model_path, '.', limit=10)

# Accumulate errors
errors = []

# Loop through all data provided by the data reader
while True:
    input_data = validation_data_reader.get_next()
    if input_data is None:
        break  # End of data

    orig_outputs = test_model(input_model_path, input_data)
    quant_outputs = test_model(output_model_path, input_data)

    # Compute absolute error for the current batch and store it
    batch_error = np.abs(orig_outputs - quant_outputs)
    errors.append(batch_error)

# Compute the mean of all errors
if errors:
    avg_abs_error = np.mean(np.concatenate(errors))  # Concatenate to handle multiple batches
    print(f'Average absolute error per output: {avg_abs_error}')
else:
    print("No data available to compute error.")

The error happens during inference after quantization:

2024-06-26 15:03:03.367426450 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running QLinearConcat node. Name:'/model.12/Concat' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 1 has mismatched dimensions of 40 and 39
---------------------------------------------------------------------------
Fail                                      Traceback (most recent call last)
Cell In[43], line 14
     11     break  # End of data
     13 orig_outputs = test_model(input_model_path, input_data)
---> 14 quant_outputs = test_model(output_model_path, input_data)
     16 # Compute absolute error for the current batch and store it
     17 batch_error = np.abs(orig_outputs - quant_outputs)

Cell In[42], line 4, in test_model(model_path, input_data)
      2 print(input_data['images'].shape)
      3 session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
----> 4 outputs = session.run(None, input_data)
      5 return outputs[0]

File ~/anaconda3/envs/yolo/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:220, in Session.run(self, output_names, input_feed, run_options)
    218     output_names = [output.name for output in self._outputs_meta]
    219 try:
--> 220     return self._sess.run(output_names, input_feed, run_options)
    221 except C.EPFail as err:
    222     if self._enable_fallback:

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running QLinearConcat node. Name:'/model.12/Concat' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 1 has mismatched dimensions of 40 and 39

@Jamil
Copy link
Author

Jamil commented Jun 26, 2024

Let me know if you are able to reproduce or have issues running this!

@yihonglyu
Copy link
Contributor

yihonglyu commented Jun 26, 2024

Could you share the model for the reproducer, too? Thanks

@HectorSVC
Copy link
Contributor

When you said, "regression in onnxruntime functionality", do you mean it used to work before?

@Jamil
Copy link
Author

Jamil commented Jun 26, 2024

Yes, I have a model that I previously quantized with ORT successfully but I don't remember which versions of ultralytics/ort/onnx I used. I'm trying to reproduce it now.

@Jamil
Copy link
Author

Jamil commented Jul 12, 2024

@HectorSVC @yihonglyu Ok, I've been able to reproduce. This is the issue I get with the latest versions of ORT:

Traceback (most recent call last):
  File "onnx_session.py", line 13, in <module>
    session = onnxruntime.InferenceSession(sys.argv[1], sess_options=options, providers=["QNNExecutionProvider"], provider_options=[{"backend_path": "/root/qnn/lib/libQnnHtp.so"}])
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node 'Conv_token_423' OpType:Conv with domain:com.ms.internal.nhwc was inserted using the NHWC format as requested by QNNExecutionProvider, but was not selected by that EP. This means the graph is now invalid as there will not be an EP able to run the node. This could be a bug in layout transformer, or in the GetCapability implementation of the EP.

With ORT 1.17, it runs fine. When running with more recent versions, I get the error. Potentially related (but different op?): #16462

Note that this is when I exclude the last conv layer from quantization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization
Projects
None yet
Development

No branches or pull requests

3 participants