-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added instr.sched options to tune_gemm.py
#649
base: main_perf
Are you sure you want to change the base?
Conversation
a7e82e6
to
932aef2
Compare
Can you add some description in the README file ? Hi @xiaohuguo2023, I added the following line under
|
Do we know which gemm sizes can potentially benefit from setting |
66eb96c
to
380970d
Compare
No, at this moment it is difficult to say |
380970d
to
95a3c3f
Compare
76dbf3e
to
42bca31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but need to wait for features in TIP. Blocking for now.
Sorry. I didn't get what you mean |
If this is the case, I'd say we don't want to add more tuning parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments
Ok. Let me make |
|
||
bestConfig_compact_str = gen_configStr(bestConfig) | ||
if not run_bench: | ||
print(f'best_config: {bestConfig_compact_str}', end=" ", flush=True) | ||
print(f'\nbest_config: {bestConfig_compact_str}', end=" ", flush=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example output after adding '\n'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure.
> ./tune_gemm.py --gemm_size_file ~/tuning/input.yaml --gpu_ids 3,4,5 --jobs 32 --o ~/tuning/output.yaml
Tuning 1 gemm sizes starts at: 2024-10-29 14:26:32.618604
SIZE: 4864 8192 4160 TN nConfigs: 720
TFLOPS: 516.47; time(us): 641.89
best_config: BM128_BN128_BK64_GM8_SK1_nW4_nS2_EU0_kP2_mfma16_schedDEFAULT
>>> Elapsed time: 0:04:11.238153 = 0:00:20.441773 (compile) + 0:03:49.947198 (profile) + 0:00:00.681876 (post processing)
Tuning ends at: 2024-10-29 14:30:44.031012
Total tuning time (h:m:s): 0:04:11.412408
num_warps, 'num_stages': num_stages, 'waves_per_eu': waves_per_eu, | ||
'matrix_instr_nonkdim': matrix_instr_nonkdim, 'kpack': kpack | ||
}) | ||
for sched_variant in sched_variants: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are here, can you replace the nested for loops with itertools.product?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure. Done!
Yes, I'm ok with it |
42bca31
to
5230674
Compare
cdeffe9
to
df11ba0
Compare
df11ba0
to
df063d4
Compare
@@ -112,6 +113,7 @@ def matmul_{configStr}(M, N, K, am, ak, bk, bn, cm, cn, biasn): | |||
EVEN_K = {EVEN_K}, | |||
GRID_MN = grid_mn, | |||
NUM_XCDS = {num_xcds}, | |||
instruction_sched_variant = {sched_variant}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing quotes ' '
@@ -145,7 +147,8 @@ def matmul_{configStr}(a, b, c, bias, M, N, K, am, ak, bk, bn, cm, cn, biasn): | |||
BIAS = {use_bias}, | |||
EVEN_K = {EVEN_K}, | |||
GRID_MN = grid[0], | |||
NUM_XCDS = {num_xcds} | |||
NUM_XCDS = {num_xcds}, | |||
instruction_sched_variant = {sched_variant}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing quote ' '
config) | ||
|
||
## {M}_{N}_{K} is removed since the same kernel can be used for differen gemm sizes | ||
configStr = f"BM{block_m}_BN{block_n}_BK{block_k}_GM{group_m}_SK{split_k}_nW{num_warps}_nS{num_stages}_EU{waves_per_eu}_kP{kpack}_mfma{mfmaInstrSize}" | ||
configStr = f"BM{block_m}_BN{block_n}_BK{block_k}_GM{group_m}_SK{split_k}_nW{num_warps}_nS{num_stages}_EU{waves_per_eu}_kP{kpack}_mfma{mfmaInstrSize}_sched{sched_variant[1:-1].upper()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we are using local-prefetch
, but we cannot have -
in kernel names. Can you also convert -
into _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments
No description provided.