add stream-k v0.2 #652

xiaohuguo2023 · 2024-10-29T22:16:04Z

streamk v0.2:

new streamk tuning script to reduce compiling and profiling time
use load/store cache modifier to reimplement spinning lock
add CI test for streamk-kernel
able to use streampipelineV2

neoblizz · 2024-10-30T21:53:29Z

python/perf-kernels/streamk/streamk_kernel.py

+            rn1 = tl.max_contiguous(tl.multiple_of(rn1, BLOCK_SIZE_N), BLOCK_SIZE_N)
+            P_ = P + pid * BLOCK_SIZE_M * BLOCK_SIZE_N + rm1[:, None] * BLOCK_SIZE_N + rn1[None, :]
+            tl.store(P_, acc, cache_modifier=".wt")
+            tl.store(locks + pid, 1, cache_modifier=".wt")


Copying some of my notes again on gfx90a: For gfx90a the load/stores with cache_modifiers do not work. Documented here: https://github.com/ROCm/triton-internal/issues/311

not sure, how we are going to address this ?

I think there might be if arch == gfx90a which we can use for this and the pid renaming, I'll check.

neoblizz · 2024-10-30T21:53:52Z

python/perf-kernels/streamk/streamk_kernel.py

-                # todo: try use tl.load once cache modifier landed upstream
-                while tl.atomic_cas(locks + next_pid, 1, 1) != 1:
+            while (end < tile_iter_end and next_pid < NUM_SMS):
+                while tl.load(locks + next_pid, cache_modifier=".cv", volatile=True) != 1:


This also does not work in gfx90a: https://github.com/ROCm/triton-internal/issues/311

I will find a MI250 to test it.

add NUM_XCDS so we can switch on/off

sorry, NUM_XCDS can't help for cache_modifier

Yeah, we need something else. I'll investigate.

python/perf-kernels/streamk/streamk_kernel.py

neoblizz · 2024-10-30T21:55:23Z

python/perf-kernels/streamk/utils/file_generator.py

+        matmul_call_str = f"""
+    if '{configStr}' not in failed_configs:
+        rotating_num = tensors['rotating_num']
+        locks = torch.zeros(({runs}, {num_sms}), device = "cuda", dtype = torch.int32)


locks can be less than int32 type, we only need 1 byte: uint8 should work. Tensile uses uint8.

can we leave this for the next release, as we need a thorough test for this, Thanks !

python/perf-kernels/streamk/03-matrix-multiplication-stream-k.py

neoblizz · 2024-10-31T18:00:15Z

python/perf-kernels/streamk/streamk_kernel.py

-                # todo: try use tl.load once cache modifier landed upstream
-                while tl.atomic_cas(locks + next_pid, 1, 1) != 1:
+            while (end < tile_iter_end and next_pid < NUM_SMS):
+                while tl.load(locks + next_pid, cache_modifier=".cv", volatile=True) != 1:


Yeah, we need something else. I'll investigate.

xiaohuguo2023 requested review from zhanglx13 and neoblizz October 29, 2024 22:16

xiaohuguo2023 mentioned this pull request Oct 29, 2024

Streamk v0.2 #646

Closed

xiaohuguo2023 added 2 commits October 30, 2024 13:10

add stream-k v0.2

55eb1fd

add unit test for streamk

650261b

micmelesse force-pushed the streamkv0.2 branch from 468765b to 650261b Compare October 30, 2024 17:10

neoblizz requested changes Oct 30, 2024

View reviewed changes

xiaohuguo2023 added 5 commits October 31, 2024 09:33

the default stream pipeline is now using num_stages=2

af7e7c5

the default stream pipeline is now using num_stages=2

340ccad

introduce NUM_XCDS to switch on/off pid renumbering

8434786

tidy up comments and add description of few testing shapes

ea7b405

update to make tune_streamk.py chmod+x

ae0be27

neoblizz approved these changes Oct 31, 2024

View reviewed changes

xiaohuguo2023 merged commit 1d60b05 into main_perf Oct 31, 2024
4 checks passed

xiaohuguo2023 deleted the streamkv0.2 branch October 31, 2024 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add stream-k v0.2 #652

add stream-k v0.2 #652

xiaohuguo2023 commented Oct 29, 2024

neoblizz Oct 30, 2024

xiaohuguo2023 Oct 31, 2024

neoblizz Oct 31, 2024

neoblizz Oct 30, 2024

xiaohuguo2023 Oct 30, 2024

xiaohuguo2023 Oct 31, 2024

xiaohuguo2023 Oct 31, 2024

neoblizz Oct 31, 2024

neoblizz Oct 30, 2024

xiaohuguo2023 Oct 31, 2024

neoblizz Oct 31, 2024

add stream-k v0.2 #652

add stream-k v0.2 #652

Conversation

xiaohuguo2023 commented Oct 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment