Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Gelu and LayerNorm fusion to L1 optimization #21332

Merged
merged 16 commits into from
Sep 9, 2024

Conversation

peishenyan
Copy link
Contributor

According to #20915, we move the Gelu and LayerNorm fusion to L1 with a condition on the ONNX opset the model imports (LayerNorm requires opset 16+ and Gelu requires opset 20+.) If the opset version doesn't meet the requirements, the fusion is delayed to L2 optimization since the internal contrib op doesn't have a requirement for any specific ONNX opset.

@peishenyan peishenyan changed the title Move Gelu and LayerNorm fusion in L1 optimization Move Gelu and LayerNorm fusion to L1 optimization Jul 12, 2024
@peishenyan
Copy link
Contributor Author

It seems that I should also change the test case since we have two chances for Gelu / LayerNorm Fusion (both L1 and L2 optimization), but recent test case only register level 2 optimization graph_transformation_mgr.Register(std::make_unique<GeluFusion>(), TransformerLevel::Level2));

WIP.

@peishenyan
Copy link
Contributor Author

Hi @skottmckay @fdwr @guschmue , I've moved the Gelu and LayerNorm fusion to L1 with a condition on the ONNX opset the model imports, and retained the possibility that the fusion can be delayed to L2 optimization if the opset version doesn't meet the requirements. I also modified the corresponding GeluFusion and LayerNormFusion test.

PTAL, thanks.

@peishenyan
Copy link
Contributor Author

Hi @skottmckay , I hope you're doing well. When you have a moment, could you please take a look at the PR I submitted? I believe your feedback would be very valuable. Thank you for your time!❤

onnxruntime/core/optimizer/layer_norm_fusion.cc Outdated Show resolved Hide resolved
onnxruntime/core/optimizer/gelu_fusion.h Outdated Show resolved Hide resolved
onnxruntime/core/optimizer/layer_norm_fusion.cc Outdated Show resolved Hide resolved
onnxruntime/core/optimizer/gelu_fusion.cc Outdated Show resolved Hide resolved
onnxruntime/core/optimizer/gelu_fusion.cc Show resolved Hide resolved
onnxruntime/test/optimizer/graph_transform_test.cc Outdated Show resolved Hide resolved
onnxruntime/core/optimizer/gelu_fusion.h Outdated Show resolved Hide resolved
onnxruntime/core/optimizer/gelu_fusion.cc Outdated Show resolved Hide resolved
onnxruntime/test/optimizer/graph_transform_test.cc Outdated Show resolved Hide resolved
skottmckay
skottmckay previously approved these changes Aug 12, 2024
@skottmckay
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@peishenyan
Copy link
Contributor Author

@skottmckay Sorry, I didn't realize the error in the code until I saw checks error. The newest commit has fixed the bugs. Please rerun the checks. Thanks.

@skottmckay
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@peishenyan
Copy link
Contributor Author

Unfortunately the checks still fail. I think they are related to the GeluFusion and LayerNormFusion in orttraining/orttraining/core/optimizer/graph_transformer_utils.cc. They originally fuse the ops at level 1 without limitation, but after my modification to these two transformers, they can only fuse the ops with satisfiable opsets.

I think the solution might be to change

case TransformerLevel::Level1: {
...
      transformers.emplace_back(std::make_unique<LayerNormFusion>(compatible_eps));
...
      transformers.emplace_back(std::make_unique<GeluFusion>(compatible_eps));
...
}

to

case TransformerLevel::Level1: {
...
      transformers.emplace_back(std::make_unique<LayerNormFusion>(compatible_eps));
      transformers.emplace_back(std::make_unique<LayerNormFusion>(compatible_eps, TransformerLevel::Level2));
...
      transformers.emplace_back(std::make_unique<GeluFusion>(compatible_eps));
      transformers.emplace_back(std::make_unique<GeluFusion>(compatible_eps, TransformerLevel::Level2));
...
}

to fix this bug.

@skottmckay Are there any suggestions you'd like to offer? Thanks.

@skottmckay
Copy link
Contributor

Maybe add an additional optional argument to each optimizer's ctor to allow forcing it to only use the contrib op. Set to true when the optimizers are added in orttraining/orttraining/core/optimizer/graph_transformer_utils.cc.

@peishenyan
Copy link
Contributor Author

Good idea!

@skottmckay
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline

@skottmckay
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

skottmckay
skottmckay previously approved these changes Sep 6, 2024
edgchen1
edgchen1 previously approved these changes Sep 6, 2024
@edgchen1 edgchen1 dismissed stale reviews from skottmckay and themself via b096a8e September 6, 2024 22:43
@skottmckay
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline

@skottmckay
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@skottmckay skottmckay merged commit 2cdc05f into microsoft:main Sep 9, 2024
82 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants