[Performance] optimizers fail to detect optimization patterns #19423

xadupre · 2024-02-05T19:15:36Z

Describe the issue

dynamo exports models with opset 18. Some optimized ops are missing after optimization (FusedMatMul) and consecutive reshape are still presents. The following patterns were detected while exported llama attention model. They were found by doing a side by side between the results (comparing shape, type and content).

Every result is described as follows:

=?|        dtype    shape           content  output of...   output name
~ | RESULT float32  1x1024x64       GSEC     Gather         /attention/Gather_1_output_0

Left side: torch script exporter
Right side: dynamo exporter

Reshape + Reshape -> Reshape

= | RESULT float32  2x8x64x1024     MWVO Transpos /attention/Transpose_3_output_0              | RESULT float32  2x8x64x1024     MWVO Transpos _attention_1_transpose_3                    
+ |                                                                                            | RESULT float32  16x64x1024      MWVO Reshape  _attention_1_view_10                         
+ |                                                                                            | RESULT float32  1x1x1024x64     GSEC Transpos _attention_1_unsqueeze_1                     
+ |                                                                                            | RESULT float32  2048x512        QBKS MatMul   _attention_q_proj_1_mm                       
~ | RESULT float32  2x1024x512      QBKS MatMul   /attention/q_proj/MatMul_output_0            | RESULT float32  2x1024x512      QBKS Reshape  _attention_1_attention_q_proj_1             
= | RESULT float32  2x1024x8x64     QBKS Reshape  /attention/Reshape_output_0                  | RESULT float32  2x1024x8x64     QBKS Reshape  _attention_1_view_6

Mul + Transpose + Mul -> Mul + Transpose

= | RESULT float32  2x8x1024x64     AZJH Mul      /attention/Mul_1_output_0                    | RESULT float32  2x8x1024x64     AZJH Mul      _attention_1_mul_1                          
+ |                                                                                            | RESULT float32  1x1x1024x64     CJYF Transpos _attention_1_unsqueeze                       
= | RESULT float32  2x8x1024x64     RWLF Mul      /attention/Mul_output_0                      | RESULT float32  2x8x1024x64     RWLF Mul      _attention_1_mul

Reshape + MatMul + Reshape + Div -> FusedMatMul

= | RESULT float32  2x8x1024x64     QVUM Add      /attention/Add_output_0                      | RESULT float32  2x8x1024x64     QVUM Add      _attention_1_add                            
+ |                                                                                            | RESULT float32  16x1024x64      QVUM Reshape  _attention_1_view_9                          
+ |                                                                                            | RESULT float32  16x1024x1024    QUCF MatMul   _attention_1_bmm                             
+ |                                                                                            | RESULT float32  2x8x1024x1024   QUCF Reshape  _attention_1_view_11                         
~ | RESULT float32  2x8x1024x1024   JSAX FusedMat /attention/Div_output_0                      | RESULT float32  2x8x1024x1024   JSAX Div      _attention_1_div                            
= | RESULT float32  2x8x1024x1024   JSAX Add      /attention/Add_2_output_0                    | RESULT float32  2x8x1024x1024   JSAX Add      _attention_1_add_2

To reproduce

-- to be updated soon --

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

9f68a27

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

Model File

No response

Is this a quantized model?

Yes

The text was updated successfully, but these errors were encountered:

fs-eire added the converter:dynamo issues related supporting the PyTorch Dynamo exporter label Feb 7, 2024

sophies927 added the performance issues related to performance regressions label Feb 22, 2024

xadupre self-assigned this Jun 10, 2024

xadupre closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] optimizers fail to detect optimization patterns #19423

[Performance] optimizers fail to detect optimization patterns #19423

xadupre commented Feb 5, 2024

[Performance] optimizers fail to detect optimization patterns #19423

[Performance] optimizers fail to detect optimization patterns #19423

Comments

xadupre commented Feb 5, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?