MLAS failing with "Could not find an implementation for QLinearMatMul" #21531

saurabhtangri · 2024-07-28T00:45:17Z

Describe the issue

model execution failing with error

NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for QLinearMatMul(21) node with name ''

To reproduce

import numpy as np
import onnx
from onnx import helper, TensorProto, numpy_helper
import onnxruntime as ort

Define the input dimensions and types

input_dim = [2, 2]
input_type = TensorProto.INT8

Create the inputs

input_A = helper.make_tensor_value_info('input_A', input_type, input_dim)
input_B = helper.make_tensor_value_info('input_B', input_type, input_dim)
input_A_scale = helper.make_tensor_value_info('input_A_scale', TensorProto.FLOAT, [])
input_A_zero_point = helper.make_tensor_value_info('input_A_zero_point', input_type, [])
input_B_scale = helper.make_tensor_value_info('input_B_scale', TensorProto.FLOAT, [])
input_B_zero_point = helper.make_tensor_value_info('input_B_zero_point', input_type, [])
output_scale = helper.make_tensor_value_info('output_scale', TensorProto.FLOAT, [])
output_zero_point = helper.make_tensor_value_info('output_zero_point', input_type, [])

Create the output

output = helper.make_tensor_value_info('output', input_type, input_dim)

Create the QLinearMatMul node

qlinearmatmul_node = helper.make_node(
'QLinearMatMul',
inputs=[
'input_A', 'input_A_scale', 'input_A_zero_point',
'input_B', 'input_B_scale', 'input_B_zero_point',
'output_scale', 'output_zero_point'
],
outputs=['output']
)

Create the graph

graph_def = helper.make_graph(
[qlinearmatmul_node],
'qlinearmatmul-graph',
[input_A, input_A_scale, input_A_zero_point, input_B, input_B_scale, input_B_zero_point, output_scale, output_zero_point],
[output]
)

Create the model

model_def = helper.make_model(graph_def, producer_name='qlinearmatmul-model')
model_def.ir_version = 5
onnx.checker.check_model(model_def)
onnx.save(model_def, 'qlinearmatmul_model.onnx')

Test the model using ONNX Runtime

Prepare the input data

input_A_data = np.random.randint(-128, 127, size=(16, 16)).astype(np.int8)
input_B_data = np.random.randint(-128, 127, size=(16, 16)).astype(np.int8)
input_A_scale_data = np.array(0.1, dtype=np.float32)
input_A_zero_point_data = np.array(0, dtype=np.int8)
input_B_scale_data = np.array(0.1, dtype=np.float32)
input_B_zero_point_data = np.array(0, dtype=np.int8)
output_scale_data = np.array(0.2, dtype=np.float32)
output_zero_point_data = np.array(0, dtype=np.int8)

Prepare the inputs for ONNX Runtime

inputs = {
'input_A': input_A_data,
'input_A_scale': input_A_scale_data,
'input_A_zero_point': input_A_zero_point_data,
'input_B': input_B_data,
'input_B_scale': input_B_scale_data,
'input_B_zero_point': input_B_zero_point_data,
'output_scale': output_scale_data,
'output_zero_point': output_zero_point_data,
}

Enable verbose logging for ONNX Runtime session

sess_options = ort.SessionOptions()
#sess_options.log_severity_level = 0 # 0 is the most verbose logging level

Run the model on the input data

ort_session = ort.InferenceSession('qlinearmatmul_model.onnx', sess_options)

ort_outputs = ort_session.run(None, inputs)

Print the output

print("Output:", ort_outputs[0])

Urgency

not urgent and not sure if the script is doing something wrong.

Platform

Windows

OS Version

Windows 11+WSL

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

github-actions · 2024-09-01T15:00:47Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

Jubengo · 2024-10-22T07:43:00Z

I also encounter this error.
Is QLinearMatMul not handled at all by ORT or does it work only on a specific type of input/output ?

github-actions bot added the platform:windows issues related to the Windows platform label Jul 28, 2024

sophies927 removed the platform:windows issues related to the Windows platform label Aug 1, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLAS failing with "Could not find an implementation for QLinearMatMul" #21531

MLAS failing with "Could not find an implementation for QLinearMatMul" #21531

saurabhtangri commented Jul 28, 2024

github-actions bot commented Sep 1, 2024

Jubengo commented Oct 22, 2024

MLAS failing with "Could not find an implementation for QLinearMatMul" #21531

MLAS failing with "Could not find an implementation for QLinearMatMul" #21531

Comments

saurabhtangri commented Jul 28, 2024

Describe the issue

To reproduce

Define the input dimensions and types

Create the inputs

Create the output

Create the QLinearMatMul node

Create the graph

Create the model

Test the model using ONNX Runtime

Prepare the input data

Prepare the inputs for ONNX Runtime

Enable verbose logging for ONNX Runtime session

Run the model on the input data

Print the output

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

github-actions bot commented Sep 1, 2024

Jubengo commented Oct 22, 2024