-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Fold and Unfold #3167
base: develop
Are you sure you want to change the base?
Conversation
DuongQLee
commented
Jul 29, 2024
•
edited
Loading
edited
- Added Fold and Unfold op.
- Full benchmark result compared to ROCm Here
- Average performance:
Op | Dtype | Direction | Time |
---|---|---|---|
Unfold | fp32 | fwd | 16.43641979 |
fp32 | bwd | 1.458798342 | |
fp16 | fwd | 15.83955361 | |
fp16 | bwd | 1.459763543 | |
bfp16 | fwd | 15.90593279 | |
bfp16 | bwd | 1.455323877 | |
Fold | fp32 | fwd | 1.463731927 |
fp32 | bwd | 28.09828887 | |
fp16 | fwd | 1.479315364 | |
fp16 | bwd | 28.08557933 | |
bfp16 | fwd | 1.47158127 | |
bfp16 | bwd | 26.84660993 |
@iq136boy Can I have the error log for Jenkins - Fp32 Hip All gfx90a? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review.
int N, | ||
int C, | ||
int H, | ||
int W, | ||
int P, | ||
int L, | ||
int LH, | ||
int LW, | ||
int kernel_size_h, | ||
int kernel_size_w, | ||
int stride_h, | ||
int stride_w, | ||
int padding_h, | ||
int padding_w, | ||
int dilation_h, | ||
int dilation_w, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It limits the kernel applicability for tensors <2Gb
else | ||
{ | ||
GTEST_SKIP(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3152 (comment)
size_t N; | ||
size_t C; | ||
size_t D; | ||
size_t H; | ||
size_t W; | ||
std::vector<int32_t> kernelSize; | ||
std::vector<int32_t> stride; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While dimensions are size_t
, strides are 32bit integers. Can the be consistent?
[[maybe_unused]] const ExecutionContext& /*context*/, | ||
[[maybe_unused]] const miopen::fold::FoldFwdProblemDescription& problem) const | ||
{ | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel does not support tensors >2Gb.
src/solver/fold/fold_forward.cpp
Outdated
[[maybe_unused]] const ExecutionContext& /*context*/, | ||
[[maybe_unused]] const miopen::fold::FoldFwdProblemDescription& problem) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either
const ExecutionContext& /*context*/,
or
[[maybe_unused]] const ExecutionContext& context,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve in commit 918091d
{11, 13, 0, 17, 19, {3, 3}, {3, 2}, {0, 0}, {1, 1}, true}, | ||
{11, 13, 0, 17, 19, {3, 3}, {1, 1}, {3, 2}, {1, 1}, true}, | ||
{11, 13, 0, 17, 19, {3, 3}, {1, 1}, {0, 0}, {3, 2}, true}, | ||
{11, 13, 0, 33, 37, {4, 3}, {2, 3}, {5, 2}, {3, 5}, true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like the algorithm rely on isContiguous
and there is no non-isContiguous
testcase to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added non contiguous test cases in commit 85c1ee0
if(!isContiguous) | ||
std::swap(inputDim.front(), inputDim.back()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why do you swap input dimensions for non-isContiguous
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if we have tensor size {2, 4, 8, 16}, then contiguous strides is {512, 128, 16, 1}
If we swap the input strides and dim, then dim is {16, 4, 8, 2} and strides is {1, 16, 2, 64}
This will make the memory access non-contiguous but the size of the tensor still remains the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't it be just calculated in a reverse direction? Like {1, 2, 8, 64}. It is still the same total tensor size, but non-contiguous (i.e. transposed), but without any strange swapping.
Swapping looks suspicious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made a pytorch implementations of this method: https://colab.research.google.com/drive/1TEPXclDXDQ5cLHsoPpQmrBmVUvuPKDLe?usp=drive_link
As you can see, by swapping the front back dim (tensor dim is {16, 4, 8, 2}), the memory strides make the tensor incontiguous.
@iq136boy Would you send us the build log of this PR? |
Please check the error log: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BLocked by following CI failure. Need to restart after the CI failure is solved.
[2024-09-23T13:54:42.218Z] Exception occurred: org.kohsuke.github.HttpException: {"message":"API rate limit exceeded for user ID 49319081. If you reach out to GitHub Support for help, please include the request ID