[QNN] MatMulAddFusion and Reshape Related Fusion #22494

centwang · 2024-10-18T06:18:21Z

QNN EP relies on Gemm Op to use FullyConnected QNN Op to run the model, which is much faster than MatMul+Add. This PR fuses MatMul+Add when MatMul's 2nd input is 2D initializer, no matter the rank of the 1st input. If the 1st input is not 2D tensor, Reshape nodes will be added.

On QNN EP, the memory allocation is for each activation tensor, so Reshape/Squeeze/Unsqueeze is not no-op. This PR also add some fusion trying to remove redundant reshape nodes. For some QNN AI Hub models on specific device, without removing the Reshape nodes, it cannot finalize the graph when execution, but works well after removing.

Run below models with and without the change:
swin_tiny: Average inference time cost: 12.8077 ms | Average inference time cost: 23.956 ms
swin_base: Average inference time cost: 27.0639 ms | Average inference time cost: 57.6608 ms
convnext_tiny: Average inference time cost: 3.42956 ms | Average inference time cost: 16.1848 ms
openai_clip_CLIPTextEncoder: Average inference time cost: 5.96104 ms | Average inference time cost: 220.406 ms
openai_clip_CLIPImageEncoder: Average inference time cost: 41.8206 ms | Average inference time cost: 919.712 ms

NOTE that current change skips the Attention pattern because it not it will cause AttentionFusion to work. Ideally we need to adjust the AttentionFusion to support the Gemm pattern, but it requires big changes. Maybe we can do this in the future, say, when we want to run transformer models on QNN, since we don't have Attention QNN, we still want to fuse MatMul+Add in the Attention pattern to use FullyConnected in QNN side.

onnxruntime/core/optimizer/matmul_add_fusion.cc

adrianlizarraga · 2024-11-06T17:40:25Z

@centwang Thank you for the PR. It looks like many unit tests and pipelines are still not passing. Could you please address those issues first?

onnxruntime/core/optimizer/reshape_fusion.cc

onnxruntime/test/optimizer/graph_transform_test.cc

onnxruntime/core/optimizer/matmul_add_fusion.cc

onnxruntime/test/optimizer/graph_transform_test.cc

onnxruntime/test/providers/qnn/gemm_op_test.cc

onnxruntime/core/optimizer/matmul_add_fusion.cc

onnxruntime/core/optimizer/reshape_fusion.cc

skottmckay · 2024-11-28T00:42:21Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/reshape_gemm_fusion.cc

+  Qnn_TensorType_t tensor_type = qnn_model_wrapper.GetTensorType(weight_tensor_name);
+  Qnn_DataType_t data_type = QNN_DATATYPE_FLOAT_32;
+  ORT_RETURN_IF_ERROR(utils::GetQnnDataType(false, weight_def.node_arg.TypeAsProto(), data_type));
+  const auto& weight_tensor_proto = qnn_model_wrapper.GetInitializerTensors().at(weight_tensor_name);


If the initializer is not constant you have to use the value at runtime as it could be overridden in each Run call.

So I would suggest fixing the test (if the initializers are not intended to be mutable) or updating the logic to not take the node if the initializer is mutable.

onnxruntime/core/providers/qnn/builder/qnn_node_group/reshape_gemm_fusion.cc

onnxruntime/test/providers/qnn/qnn_basic_test.cc

centwang force-pushed the weicwang/matmul_add_fusion branch from 7d3d515 to 0a05430 Compare October 21, 2024 03:34

snnn previously approved these changes Oct 21, 2024

View reviewed changes

centwang requested review from adrianlizarraga and skottmckay October 22, 2024 02:08

skottmckay reviewed Oct 22, 2024

View reviewed changes

onnxruntime/core/optimizer/matmul_add_fusion.cc Outdated Show resolved Hide resolved

onnxruntime/core/optimizer/matmul_add_fusion.cc Outdated Show resolved Hide resolved

centwang dismissed snnn’s stale review via ca59611 October 29, 2024 11:51

centwang force-pushed the weicwang/matmul_add_fusion branch from a8388b7 to ca59611 Compare October 29, 2024 11:51

centwang changed the title ~~Add More Cases to MatMulAddFusion~~ [QNN] MatMulAddFusion and Reshape Related Fusion Oct 29, 2024

centwang requested review from jywu-msft and cloudhan October 29, 2024 11:52

skottmckay reviewed Nov 8, 2024

View reviewed changes

centwang added 6 commits November 22, 2024 11:45

matmul add fusion

9405087

fix ut failure

c685f39

fix compile error

e5146ce

fix attn pattern

787b4fb

reshape related fusion

8606668

resolve comments

47d4755

centwang force-pushed the weicwang/matmul_add_fusion branch from ca59611 to 47d4755 Compare November 25, 2024 05:49

centwang added 2 commits November 25, 2024 14:16

fix build error

d063df5

fix test failure

8d75e0a

skottmckay reviewed Nov 28, 2024

View reviewed changes

centwang added 4 commits December 5, 2024 10:37

Merge branch 'main' into weicwang/matmul_add_fusion

dff068b

resolve comments

81d3fe9

Merge branch 'main' into weicwang/matmul_add_fusion

e1c77da

fix merge error

9b09618

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN] MatMulAddFusion and Reshape Related Fusion #22494

[QNN] MatMulAddFusion and Reshape Related Fusion #22494

centwang commented Oct 18, 2024 •

edited

Loading

adrianlizarraga commented Nov 6, 2024

skottmckay Nov 28, 2024

[QNN] MatMulAddFusion and Reshape Related Fusion #22494

Are you sure you want to change the base?

[QNN] MatMulAddFusion and Reshape Related Fusion #22494

Conversation

centwang commented Oct 18, 2024 • edited Loading

adrianlizarraga commented Nov 6, 2024

skottmckay Nov 28, 2024

Choose a reason for hiding this comment

centwang commented Oct 18, 2024 •

edited

Loading