Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] optimizers fail to detect optimization patterns #19423

Closed
xadupre opened this issue Feb 5, 2024 · 0 comments
Closed

[Performance] optimizers fail to detect optimization patterns #19423

xadupre opened this issue Feb 5, 2024 · 0 comments
Assignees
Labels
converter:dynamo issues related supporting the PyTorch Dynamo exporter performance issues related to performance regressions

Comments

@xadupre
Copy link
Member

xadupre commented Feb 5, 2024

Describe the issue

dynamo exports models with opset 18. Some optimized ops are missing after optimization (FusedMatMul) and consecutive reshape are still presents. The following patterns were detected while exported llama attention model. They were found by doing a side by side between the results (comparing shape, type and content).

Every result is described as follows:

=?|        dtype    shape           content  output of...   output name
~ | RESULT float32  1x1024x64       GSEC     Gather         /attention/Gather_1_output_0

Left side: torch script exporter
Right side: dynamo exporter

Reshape + Reshape -> Reshape

= | RESULT float32  2x8x64x1024     MWVO Transpos /attention/Transpose_3_output_0              | RESULT float32  2x8x64x1024     MWVO Transpos _attention_1_transpose_3                    
+ |                                                                                            | RESULT float32  16x64x1024      MWVO Reshape  _attention_1_view_10                         
+ |                                                                                            | RESULT float32  1x1x1024x64     GSEC Transpos _attention_1_unsqueeze_1                     
+ |                                                                                            | RESULT float32  2048x512        QBKS MatMul   _attention_q_proj_1_mm                       
~ | RESULT float32  2x1024x512      QBKS MatMul   /attention/q_proj/MatMul_output_0            | RESULT float32  2x1024x512      QBKS Reshape  _attention_1_attention_q_proj_1             
= | RESULT float32  2x1024x8x64     QBKS Reshape  /attention/Reshape_output_0                  | RESULT float32  2x1024x8x64     QBKS Reshape  _attention_1_view_6            

Mul + Transpose + Mul -> Mul + Transpose

= | RESULT float32  2x8x1024x64     AZJH Mul      /attention/Mul_1_output_0                    | RESULT float32  2x8x1024x64     AZJH Mul      _attention_1_mul_1                          
+ |                                                                                            | RESULT float32  1x1x1024x64     CJYF Transpos _attention_1_unsqueeze                       
= | RESULT float32  2x8x1024x64     RWLF Mul      /attention/Mul_output_0                      | RESULT float32  2x8x1024x64     RWLF Mul      _attention_1_mul       

Reshape + MatMul + Reshape + Div -> FusedMatMul

= | RESULT float32  2x8x1024x64     QVUM Add      /attention/Add_output_0                      | RESULT float32  2x8x1024x64     QVUM Add      _attention_1_add                            
+ |                                                                                            | RESULT float32  16x1024x64      QVUM Reshape  _attention_1_view_9                          
+ |                                                                                            | RESULT float32  16x1024x1024    QUCF MatMul   _attention_1_bmm                             
+ |                                                                                            | RESULT float32  2x8x1024x1024   QUCF Reshape  _attention_1_view_11                         
~ | RESULT float32  2x8x1024x1024   JSAX FusedMat /attention/Div_output_0                      | RESULT float32  2x8x1024x1024   JSAX Div      _attention_1_div                            
= | RESULT float32  2x8x1024x1024   JSAX Add      /attention/Add_2_output_0                    | RESULT float32  2x8x1024x1024   JSAX Add      _attention_1_add_2   

To reproduce

-- to be updated soon --

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

9f68a27

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

Model File

No response

Is this a quantized model?

Yes

@fs-eire fs-eire added the converter:dynamo issues related supporting the PyTorch Dynamo exporter label Feb 7, 2024
@sophies927 sophies927 added the performance issues related to performance regressions label Feb 22, 2024
@xadupre xadupre self-assigned this Jun 10, 2024
@xadupre xadupre closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
converter:dynamo issues related supporting the PyTorch Dynamo exporter performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

3 participants