Add FlattenAndUnpad Op #17845

guyang3532 · 2023-10-09T11:16:17Z

Description

Add an op named FlattenAndUnpad.
This op implements functions:

Flatten the first two dims of input tensor.
Gather valid value from input tensor with index tensor,.

Motivation and Context

The grad op of PadAndUnflatten was GatherGrad which is inefficient in performance.
I implement this FlattenAndUnpad just to replace the GatherGrad as grad of PadAndUnflatten.
With this op, we also can simplify the "Reshape + ShrunkenGather" pattern to PadAndUnflatten in padding elimination optimizer, which will also improve performance.

orttraining/orttraining/core/graph/gradient_builder.cc

orttraining/orttraining/test/training_ops/cuda/flatten_and_unpad_test.cc

orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py

orttraining/orttraining/core/graph/training_op_defs.cc

orttraining/orttraining/test/training_ops/cuda/flatten_and_unpad_test.cc

orttraining/orttraining/training_ops/rocm/rocm_training_kernels.cc

orttraining/orttraining/training_ops/cuda/tensor/flatten_and_unpad.cc

orttraining/orttraining/training_ops/cuda/tensor/flatten_and_unpad_impl.cu

orttraining/orttraining/core/graph/gradient_builder.cc

orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py

pengwa · 2023-11-07T06:11:33Z

Do you have perf improvement numbers to share in the PR description?

orttraining/orttraining/training_ops/cuda/tensor/flatten_and_unpad_impl.cu

orttraining/orttraining/training_ops/cuda/tensor/pad_and_unflatten_impl.cu

orttraining/orttraining/training_ops/cuda/tensor/flatten_and_unpad_impl.cu

### Description Add an op named `FlattenAndUnpad`. This op implements functions: 1. Flatten the first two dims of input tensor. 2. Gather valid value from input tensor with index tensor,. ### Motivation and Context The grad op of `PadAndUnflatten` was `GatherGrad` which is inefficient in performance. I implement this `FlattenAndUnpad` just to replace the `GatherGrad` as grad of `PadAndUnflatten`. With this op, we also can simplify the "Reshape + ShrunkenGather" pattern to `PadAndUnflatten` in padding elimination optimizer, which will also improve performance.

guyang3532 requested a review from pengwa October 9, 2023 11:16

pengwa reviewed Oct 9, 2023

View reviewed changes

orttraining/orttraining/core/graph/gradient_builder.cc Outdated Show resolved Hide resolved

pengwa reviewed Oct 9, 2023

View reviewed changes

orttraining/orttraining/core/graph/gradient_builder.cc Outdated Show resolved Hide resolved

guyang3532 force-pushed the yangu/flatten_and_unpad branch 2 times, most recently from 89c157e to cbf9348 Compare October 10, 2023 12:20

pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Oct 11, 2023

guyang3532 force-pushed the yangu/flatten_and_unpad branch from cbf9348 to 69db77a Compare October 11, 2023 06:26

github-advanced-security bot found potential problems Oct 11, 2023

View reviewed changes

orttraining/orttraining/test/training_ops/cuda/flatten_and_unpad_test.cc Fixed Show fixed Hide fixed

guyang3532 force-pushed the yangu/flatten_and_unpad branch from 69db77a to ec58f22 Compare October 12, 2023 07:25

github-advanced-security bot found potential problems Oct 12, 2023

View reviewed changes

guyang3532 force-pushed the yangu/flatten_and_unpad branch from ec58f22 to 2003901 Compare October 12, 2023 07:41

Add FlattenAndUnpad Op

ba4fa1e

guyang3532 force-pushed the yangu/flatten_and_unpad branch from 2003901 to ba4fa1e Compare November 7, 2023 05:17