Implement Fold and Unfold #3167

DuongQLee · 2024-07-29T09:25:30Z

Added Fold and Unfold op.
Full benchmark result compared to ROCm Here
Average performance:

Op	Dtype	Direction	Time
Unfold	fp32	fwd	16.43641979
	fp32	bwd	1.458798342
	fp16	fwd	15.83955361
	fp16	bwd	1.459763543
	bfp16	fwd	15.90593279
	bfp16	bwd	1.455323877
Fold	fp32	fwd	1.463731927
	fp32	bwd	28.09828887
	fp16	fwd	1.479315364
	fp16	bwd	28.08557933
	bfp16	fwd	1.47158127
	bfp16	bwd	26.84660993

…sor_view_idx

DuongQLee · 2024-08-07T08:00:47Z

@iq136boy Can I have the error log for Jenkins - Fp32 Hip All gfx90a? Thank you.

CAHEK7

Partial review.

CAHEK7 · 2024-08-12T16:27:40Z

src/kernels/MIOpenUnfold.cpp

+ int N,
+ int C,
+ int H,
+ int W,
+ int P,
+ int L,
+ int LH,
+ int LW,
+ int kernel_size_h,
+ int kernel_size_w,
+ int stride_h,
+ int stride_w,
+ int padding_h,
+ int padding_w,
+ int dilation_h,
+ int dilation_w,


It limits the kernel applicability for tensors <2Gb

CAHEK7 · 2024-08-13T09:58:39Z

test/gtest/unfold.cpp

+ else
+ {
+ GTEST_SKIP();
+ }


See #3152 (comment)

CAHEK7 · 2024-08-13T10:01:40Z

test/gtest/unfold.hpp

+ size_t N;
+ size_t C;
+ size_t D;
+ size_t H;
+ size_t W;
+ std::vector<int32_t> kernelSize;
+ std::vector<int32_t> stride;


While dimensions are size_t, strides are 32bit integers. Can the be consistent?

CAHEK7 · 2024-08-13T10:04:40Z

src/solver/fold/fold_forward.cpp

+ [[maybe_unused]] const ExecutionContext& /*context*/,
+ [[maybe_unused]] const miopen::fold::FoldFwdProblemDescription& problem) const
+{
+ return true;


Kernel does not support tensors >2Gb.

CAHEK7 · 2024-08-13T10:05:54Z

src/solver/fold/fold_forward.cpp

+ [[maybe_unused]] const ExecutionContext& /*context*/,
+ [[maybe_unused]] const miopen::fold::FoldFwdProblemDescription& problem) const


Either

const ExecutionContext& /*context*/,

or

[[maybe_unused]] const ExecutionContext& context,

resolve in commit 918091d

CAHEK7 · 2024-08-13T10:08:45Z

test/gtest/unfold.hpp

+ {11, 13, 0, 17, 19, {3, 3}, {3, 2}, {0, 0}, {1, 1}, true},
+ {11, 13, 0, 17, 19, {3, 3}, {1, 1}, {3, 2}, {1, 1}, true},
+ {11, 13, 0, 17, 19, {3, 3}, {1, 1}, {0, 0}, {3, 2}, true},
+ {11, 13, 0, 33, 37, {4, 3}, {2, 3}, {5, 2}, {3, 5}, true},


It doesn't look like the algorithm rely on isContiguous and there is no non-isContiguous testcase to check it.

I have added non contiguous test cases in commit 85c1ee0

CAHEK7 · 2024-08-13T10:09:55Z

test/gtest/unfold.hpp

+ if(!isContiguous)
+ std::swap(inputDim.front(), inputDim.back());


Could you explain why do you swap input dimensions for non-isContiguous case?

For example, if we have tensor size {2, 4, 8, 16}, then contiguous strides is {512, 128, 16, 1}
If we swap the input strides and dim, then dim is {16, 4, 8, 2} and strides is {1, 16, 2, 64}
This will make the memory access non-contiguous but the size of the tensor still remains the same.

Why can't it be just calculated in a reverse direction? Like {1, 2, 8, 64}. It is still the same total tensor size, but non-contiguous (i.e. transposed), but without any strange swapping.
Swapping looks suspicious.

I have made a pytorch implementations of this method: https://colab.research.google.com/drive/1TEPXclDXDQ5cLHsoPpQmrBmVUvuPKDLe?usp=drive_link
As you can see, by swapping the front back dim (tensor dim is {16, 4, 8, 2}), the memory strides make the tensor incontiguous.

test/gtest/unfold.hpp

long10024070 · 2024-08-20T03:12:26Z

@iq136boy Would you send us the build log of this PR?

iq136boy · 2024-08-28T01:36:37Z

Please check the error log:
test_error_log.txt

iq136boy

BLocked by following CI failure. Need to restart after the CI failure is solved.

[2024-09-23T13:54:42.218Z] Exception occurred: org.kohsuke.github.HttpException: {"message":"API rate limit exceeded for user ID 49319081. If you reach out to GitHub Support for help, please include the request ID

DuongQLee added 28 commits July 2, 2024 11:02

UnfoldFwd4d driver, test and api

113620a

githook format

6902fdf

githook format

0f15ed5

unfold backward driver and gtest

9b49b9d

githook format

83fba56

Add foldfwd, foldbwd, problem_description verification, gtest and driver

101794f

githook format

9286ce7

update doc and miopen.h description

e59ce36

Update driver help text

c114934

Change IN_OUT_TYPE to FLOAT

bd5db59

add __restrict__ to tensor pointer

4bb5855

replace include "" with <>

918a267

change all int -> int32_t, remove duplicate lines in solver

4f51b6e

githook format

e726fc1

remove useless if else in problem description

8ee2861

add more tensor_layout_t constructor and update kernel to use get_ten…

f3dea16

…sor_view_idx

githook format

4a83296

remove {}

299117b

update code as comments

e76095e

githook format

526f772

cpu_fold -> cpu_unfold

2a3d2b0

update code as comments

27e26c3

githook format

366e350

Merge branch 'develop' into mv_510_fold

8f14828

githook format

2b3bd1f

Update gtest code

5c506b4

Merge remote-tracking branch 'origin' into mv_510_fold

4e97559

githook format

e2631d8

DuongQLee added enhancement external_collaborator labels Jul 29, 2024

DuongQLee added 2 commits August 2, 2024 04:56

fix git merge dup

2f9bce7

githook format

fa6f15a

DuongQLee marked this pull request as ready for review August 2, 2024 06:52

DuongQLee requested review from a team, JehandadKhan and junliume as code owners August 2, 2024 06:52

DuongQLee added 7 commits August 5, 2024 03:47

update tensor_view and kernel code

45ed5c1

githook format

ba2020b

remove duplicate miopen ops and update doc

87edcbd

update spacing

db7b9a8

empty commit

857db5c

update gtest syntax

0da1cc6

githook format

879c5c7

DuongQLee added 2 commits August 12, 2024 03:05

Merge branch 'develop' into mv_510_fold

9e66c78

githook format

7d07012

CAHEK7 requested changes Aug 13, 2024

View reviewed changes

DuongQLee and others added 6 commits August 13, 2024 16:40

add not contiguous test cases for fold and unfold

85c1ee0

githook format

66e5dcb

remove /*context*/ for solver

918091d

remove gen_one

901d7b3

githook format

8956248

Merge branch 'develop' into impl_fold_unfold

c4e1d12

littlecutebird mentioned this pull request Aug 23, 2024

MV-532: [MV-MIOpen] Benchmark & Port AvgPool moreh-dev/MIOpen#48

Open

long10024070 requested review from iq136boy, DrizztDoUrden and BrianHarrisonAMD September 23, 2024 03:07

iq136boy reviewed Sep 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Fold and Unfold #3167

Implement Fold and Unfold #3167

DuongQLee commented Jul 29, 2024 •

edited

Loading

DuongQLee commented Aug 7, 2024

CAHEK7 left a comment

CAHEK7 Aug 12, 2024

CAHEK7 Aug 13, 2024

CAHEK7 Aug 13, 2024

CAHEK7 Aug 13, 2024

CAHEK7 Aug 13, 2024

DuongQLee Aug 13, 2024

CAHEK7 Aug 13, 2024

DuongQLee Aug 13, 2024

CAHEK7 Aug 13, 2024

DuongQLee Aug 13, 2024

CAHEK7 Aug 16, 2024

DuongQLee Aug 26, 2024 •

edited

Loading

long10024070 commented Aug 20, 2024

iq136boy commented Aug 28, 2024

iq136boy left a comment

		[[maybe_unused]] const ExecutionContext& /context/,
		[[maybe_unused]] const miopen::fold::FoldFwdProblemDescription& problem) const

		if(!isContiguous)
		std::swap(inputDim.front(), inputDim.back());

Implement Fold and Unfold #3167

Are you sure you want to change the base?

Implement Fold and Unfold #3167

Conversation

DuongQLee commented Jul 29, 2024 • edited Loading

DuongQLee commented Aug 7, 2024

CAHEK7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DuongQLee Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

long10024070 commented Aug 20, 2024

iq136boy commented Aug 28, 2024

iq136boy left a comment

Choose a reason for hiding this comment

DuongQLee commented Jul 29, 2024 •

edited

Loading

DuongQLee Aug 26, 2024 •

edited

Loading