add split3inner #19886

kailums · 2024-03-13T09:59:38Z

Description

The split op is using pin_memory when split on different sizes.

But pin_memory is not capable for using cudagraph.

Add a new implementation for only transformer scenarios, it split the qkv_proj into q, k, v, not using pin_memory.

Motivation and Context

onnxruntime/core/providers/cuda/tensor/split.cc

onnxruntime/core/providers/cuda/tensor/split_impl.cu

wangyems · 2024-06-14T18:21:48Z

I think that Split op introduces unnecessary data copies. It can always be done along with BNSH transpose

onnxruntime/core/providers/cuda/tensor/split.cc

kailums · 2024-06-18T08:45:43Z

I think that Split op introduces unnecessary data copies. It can always be done along with BNSH transpose

For the using of pin memory, I think it is mainly because the split size is varying and so the number of outputs is varying, so it needs to use a vector to store all the split outputs pointer, and this vector should be located in cuda memory, it uses a pin memory vector to store output pointers and then copy into a temp cuda vector.

And here I only handle the output size is 3, then I don't need to use pin memory and temp cuda vector, just using 3 output parameters.

tianleiwu · 2024-06-18T21:01:25Z

@kailums, is it better to support packed QKV format in attention operators so that there is no need to Split?

Currently Attention/MultiHeadAttention supports packed qkv format. We can also support it in other operators if needed.

kailums · 2024-06-19T03:59:09Z

@kailums, is it better to support packed QKV format in attention operators so that there is no need to Split?

Currently Attention/MultiHeadAttention supports packed qkv format. We can also support it in other operators if needed.

yes, most attention ops support packed QKV, but still there has scenario that exported onnx model has split in it, and this prevent it from using cudagraph.

Our scenario is vllm+ort, the paged attention op haven't support packed qkv, so we have a split op.

I think from a more general perspective, this split op should avoid using pin_memory, but i haven't figure out how to implement a general version of split op that supports any split output sizes without using pin_memory.

onnxruntime/core/providers/cuda/tensor/split.cc

onnxruntime/core/providers/cuda/tensor/split_impl.cu

tianleiwu · 2024-06-24T17:57:58Z

Is there some test case that could cover the new code? If not, please add a new test case.

onnxruntime/core/providers/cuda/tensor/split_impl.cu

onnxruntime/test/providers/cpu/tensor/split_op_test.cc

+  using ShapeAndDataT = ShapeAndData<uint8_t>;
+  std::vector<ShapeAndDataT> outputs;
+  int64_t num_outputs = -1;  // when provides split_sizes, then num_outputs should not be provided
+  const int batch = 16;


onnxruntime/test/providers/cpu/tensor/split_op_test.cc

+  std::vector<ShapeAndDataT> outputs;
+  int64_t num_outputs = -1;  // when provides split_sizes, then num_outputs should not be provided
+  const int batch = 16;
+  const int data_len = 96;  // should be multiple of 3


cloudhan · 2024-07-02T05:44:22Z

This PR cause ROCm pipeline unittest start to timeout on SplitOperatorTest.Axis2EqualSplit. A fix or revert is desired.

This reverts commit a1bbfeb.

add split3inner

ce84e25

kailums requested review from wejoncy, yufenglee, wangyems and tianleiwu March 13, 2024 09:59

Merge branch 'main' into kailums-dev/split3inner

8ee54c4

wejoncy reviewed Jun 14, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split.cc Outdated Show resolved Hide resolved

fix lint and test

a78e8e8

wejoncy reviewed Jun 14, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split.cc Outdated Show resolved Hide resolved

wangyems reviewed Jun 14, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split_impl.cu Outdated Show resolved Hide resolved

wangyems reviewed Jun 14, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split.cc Show resolved Hide resolved

kailums added 7 commits June 17, 2024 05:47

fix test fail for test_split_zero_size

d462511

fix windows build fail

cea90e7

fix windows build fail

42b95a1

fix windows build failed

655b0ac

fix windows build failed

73cb67b

fix windows build failed

3ba8320

fix windows build failed

fe9f673

tianleiwu reviewed Jun 19, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split.cc Outdated Show resolved Hide resolved

tianleiwu reviewed Jun 19, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split.cc Outdated Show resolved Hide resolved

kailums added 2 commits June 19, 2024 08:24

change last dimension check to use axis

f9c4eb0

use vectorized load/store

6e7bc87

tianleiwu reviewed Jun 24, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split_impl.cu Outdated Show resolved Hide resolved

tianleiwu reviewed Jun 24, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split_impl.cu Outdated Show resolved Hide resolved

tianleiwu reviewed Jun 24, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split_impl.cu Outdated Show resolved Hide resolved

kailums added 2 commits June 26, 2024 07:28

refactor split3Inner kernel

f17ee75

fix build failed

a5581e4

tianleiwu reviewed Jun 26, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split_impl.cu Outdated Show resolved Hide resolved

tianleiwu reviewed Jun 26, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/split_impl.cu Outdated Show resolved Hide resolved

fix for review comments

7db65da

tianleiwu approved these changes Jun 27, 2024

View reviewed changes

kailums merged commit a1bbfeb into main Jun 27, 2024
92 of 98 checks passed

kailums deleted the kailums-dev/split3inner branch June 27, 2024 10:53

github-advanced-security bot found potential problems Jun 28, 2024

View reviewed changes

cloudhan added a commit that referenced this pull request Jul 2, 2024

Revert "add split3inner (#19886)"

bdc894e

This reverts commit a1bbfeb.

cloudhan mentioned this pull request Jul 2, 2024

Revert "add split3inner" #21228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add split3inner #19886

add split3inner #19886

kailums commented Mar 13, 2024

wangyems commented Jun 14, 2024

kailums commented Jun 18, 2024

tianleiwu commented Jun 18, 2024

kailums commented Jun 19, 2024

tianleiwu commented Jun 24, 2024

cloudhan commented Jul 2, 2024

add split3inner #19886

add split3inner #19886

Conversation

kailums commented Mar 13, 2024

Description

Motivation and Context

wangyems commented Jun 14, 2024

kailums commented Jun 18, 2024

tianleiwu commented Jun 18, 2024

kailums commented Jun 19, 2024

tianleiwu commented Jun 24, 2024

cloudhan commented Jul 2, 2024