Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLAS failing with "Could not find an implementation for QLinearMatMul" #21531

Open
saurabhtangri opened this issue Jul 28, 2024 · 2 comments
Open
Labels
stale issues that have not been addressed in a while; categorized by a bot

Comments

@saurabhtangri
Copy link
Contributor

Describe the issue

model execution failing with error

NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for QLinearMatMul(21) node with name ''

To reproduce

import numpy as np
import onnx
from onnx import helper, TensorProto, numpy_helper
import onnxruntime as ort

Define the input dimensions and types

input_dim = [2, 2]
input_type = TensorProto.INT8

Create the inputs

input_A = helper.make_tensor_value_info('input_A', input_type, input_dim)
input_B = helper.make_tensor_value_info('input_B', input_type, input_dim)
input_A_scale = helper.make_tensor_value_info('input_A_scale', TensorProto.FLOAT, [])
input_A_zero_point = helper.make_tensor_value_info('input_A_zero_point', input_type, [])
input_B_scale = helper.make_tensor_value_info('input_B_scale', TensorProto.FLOAT, [])
input_B_zero_point = helper.make_tensor_value_info('input_B_zero_point', input_type, [])
output_scale = helper.make_tensor_value_info('output_scale', TensorProto.FLOAT, [])
output_zero_point = helper.make_tensor_value_info('output_zero_point', input_type, [])

Create the output

output = helper.make_tensor_value_info('output', input_type, input_dim)

Create the QLinearMatMul node

qlinearmatmul_node = helper.make_node(
'QLinearMatMul',
inputs=[
'input_A', 'input_A_scale', 'input_A_zero_point',
'input_B', 'input_B_scale', 'input_B_zero_point',
'output_scale', 'output_zero_point'
],
outputs=['output']
)

Create the graph

graph_def = helper.make_graph(
[qlinearmatmul_node],
'qlinearmatmul-graph',
[input_A, input_A_scale, input_A_zero_point, input_B, input_B_scale, input_B_zero_point, output_scale, output_zero_point],
[output]
)

Create the model

model_def = helper.make_model(graph_def, producer_name='qlinearmatmul-model')
model_def.ir_version = 5
onnx.checker.check_model(model_def)
onnx.save(model_def, 'qlinearmatmul_model.onnx')

Test the model using ONNX Runtime

Prepare the input data

input_A_data = np.random.randint(-128, 127, size=(16, 16)).astype(np.int8)
input_B_data = np.random.randint(-128, 127, size=(16, 16)).astype(np.int8)
input_A_scale_data = np.array(0.1, dtype=np.float32)
input_A_zero_point_data = np.array(0, dtype=np.int8)
input_B_scale_data = np.array(0.1, dtype=np.float32)
input_B_zero_point_data = np.array(0, dtype=np.int8)
output_scale_data = np.array(0.2, dtype=np.float32)
output_zero_point_data = np.array(0, dtype=np.int8)

Prepare the inputs for ONNX Runtime

inputs = {
'input_A': input_A_data,
'input_A_scale': input_A_scale_data,
'input_A_zero_point': input_A_zero_point_data,
'input_B': input_B_data,
'input_B_scale': input_B_scale_data,
'input_B_zero_point': input_B_zero_point_data,
'output_scale': output_scale_data,
'output_zero_point': output_zero_point_data,
}

Enable verbose logging for ONNX Runtime session

sess_options = ort.SessionOptions()
#sess_options.log_severity_level = 0 # 0 is the most verbose logging level

Run the model on the input data

ort_session = ort.InferenceSession('qlinearmatmul_model.onnx', sess_options)

ort_outputs = ort_session.run(None, inputs)

Print the output

print("Output:", ort_outputs[0])

Urgency

not urgent and not sure if the script is doing something wrong.

Platform

Windows

OS Version

Windows 11+WSL

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the platform:windows issues related to the Windows platform label Jul 28, 2024
@sophies927 sophies927 removed the platform:windows issues related to the Windows platform label Aug 1, 2024
Copy link
Contributor

github-actions bot commented Sep 1, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Sep 1, 2024
@Jubengo
Copy link

Jubengo commented Oct 22, 2024

I also encounter this error.
Is QLinearMatMul not handled at all by ORT or does it work only on a specific type of input/output ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants