What happens when we set the `K` dimension in `TiledMMA`? #1432

hyhieu · 2024-03-27T22:40:29Z

hyhieu
Mar 27, 2024

The question is just as in the title. For instance:

auto tiled_mma = make_tiled_mma(SM80_16x8x16_F32F16F16F32_TN{}, Layout<Shape<_1, _1, _2>>{});

Intuitively, the number of threads in each CTA will be doubled, but then do the register fragments for each thread only holds the reduced sum of half of the K dimension? Perhaps similar to a Split-K situation?

Thanks!

Answered by ccecka

Mar 28, 2024

That's right! The kernel would probably want to perform some kind of reduction in the epilogue or atomically update the global memory tile.

View full answer

ccecka · 2024-03-28T01:46:23Z

ccecka
Mar 28, 2024

That's right! The kernel would probably want to perform some kind of reduction in the epilogue or atomically update the global memory tile.

1 reply

hyhieu Mar 28, 2024
Author

Thank you, Cris!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens when we set the `K` dimension in `TiledMMA`? #1432

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

What happens when we set the K dimension in TiledMMA? #1432

hyhieu Mar 27, 2024

Replies: 1 comment · 1 reply

ccecka Mar 28, 2024

hyhieu Mar 28, 2024 Author

What happens when we set the `K` dimension in `TiledMMA`? #1432

hyhieu
Mar 27, 2024

Replies: 1 comment 1 reply

ccecka
Mar 28, 2024

hyhieu Mar 28, 2024
Author