Volta tensor op layout problem #213
Replies: 5 comments 6 replies
-
Which line of code? Looks like shared memory swizzling code. |
Beta Was this translation helpful? Give feedback.
-
also, what is your tile size? It maybe irrelevant to your tile size. |
Beta Was this translation helpful? Give feedback.
-
TN layout, right? |
Beta Was this translation helpful? Give feedback.
-
I verified that your change can work with row x row, threadblock: 128x128, warp: 64x64. However, when I change to row x row, threadblock: 64x128, warp: 32x64. It failed. I guess this tile size needs the swap |
Beta Was this translation helpful? Give feedback.
-
I made the change to the layout, too. 32x64 warp size can pass now. Then, I ran the warp level gemm unit tests and this one failed.
|
Beta Was this translation helpful? Give feedback.
-
There is a strange line in volta smem iterator:
It makes the pointers in right part of smem swapped, and recovered in
Why should we do that? If I remove this logic, the result is correct and the speed looks same (tested in RTX2060)
Beta Was this translation helpful? Give feedback.
All reactions