Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added instr.sched options to tune_gemm.py #649

Open
wants to merge 6 commits into
base: main_perf
Choose a base branch
from

Conversation

ravil-mobile
Copy link

No description provided.

@xiaohuguo2023
Copy link
Member

xiaohuguo2023 commented Oct 15, 2024

Can you add some description in the README file ?

Hi @xiaohuguo2023, I added the following line under GEMM Tuning Script v3.4:

  • Appended the parameters tuning space with instruction scheduling variants for the main gemm-loop (k-loop).

@zhanglx13
Copy link

Do we know which gemm sizes can potentially benefit from setting instruction_sched_variant to iglp0 or ckv3?

@ravil-mobile ravil-mobile force-pushed the ravil/main_perf branch 2 times, most recently from 66eb96c to 380970d Compare October 24, 2024 12:55
@ravil-mobile
Copy link
Author

Do we know which gemm sizes can potentially benefit from setting instruction_sched_variant to iglp0 or ckv3?

No, at this moment it is difficult to say

Copy link

@sjw36 sjw36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but need to wait for features in TIP. Blocking for now.

@ravil-mobile
Copy link
Author

Looks good but need to wait for features in TIP. Blocking for now.

Sorry. I didn't get what you mean

@zhanglx13
Copy link

Do we know which gemm sizes can potentially benefit from setting instruction_sched_variant to iglp0 or ckv3?

No, at this moment it is difficult to say

If this is the case, I'd say we don't want to add more tuning parameters.
If we know it works for "some" kernels for sure, we should use them by some simple heuristics.
If we don't know when they help, we should figure it out first.

Copy link

@zhanglx13 zhanglx13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments

@ravil-mobile
Copy link
Author

Do we know which gemm sizes can potentially benefit from setting instruction_sched_variant to iglp0 or ckv3?

No, at this moment it is difficult to say

If this is the case, I'd say we don't want to add more tuning parameters. If we know it works for "some" kernels for sure, we should use them by some simple heuristics. If we don't know when they help, we should figure it out first.

Ok. Let me make sched_variants = ["\"default\""]. This means that only no scheduling variant is going to work. Tell me whether you are ok with it. At least we should push the adapted infrastructure to main_perf


bestConfig_compact_str = gen_configStr(bestConfig)
if not run_bench:
print(f'best_config: {bestConfig_compact_str}', end=" ", flush=True)
print(f'\nbest_config: {bestConfig_compact_str}', end=" ", flush=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example output after adding '\n'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure.

> ./tune_gemm.py --gemm_size_file ~/tuning/input.yaml --gpu_ids 3,4,5 --jobs 32 --o ~/tuning/output.yaml
Tuning 1 gemm sizes starts at: 2024-10-29 14:26:32.618604
SIZE: 4864 8192 4160 TN nConfigs: 720 
TFLOPS: 516.47; time(us): 641.89 
best_config: BM128_BN128_BK64_GM8_SK1_nW4_nS2_EU0_kP2_mfma16_schedDEFAULT 
>>> Elapsed time: 0:04:11.238153 = 0:00:20.441773 (compile) + 0:03:49.947198 (profile) + 0:00:00.681876 (post processing)
Tuning ends at: 2024-10-29 14:30:44.031012
Total tuning time (h:m:s): 0:04:11.412408

num_warps, 'num_stages': num_stages, 'waves_per_eu': waves_per_eu,
'matrix_instr_nonkdim': matrix_instr_nonkdim, 'kpack': kpack
})
for sched_variant in sched_variants:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are here, can you replace the nested for loops with itertools.product?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure. Done!

@zhanglx13
Copy link

Ok. Let me make sched_variants = [""default""]. This means that only no scheduling variant is going to work. Tell me whether you are ok with it. At least we should push the adapted infrastructure to main_perf

Yes, I'm ok with it

@@ -112,6 +113,7 @@ def matmul_{configStr}(M, N, K, am, ak, bk, bn, cm, cn, biasn):
EVEN_K = {EVEN_K},
GRID_MN = grid_mn,
NUM_XCDS = {num_xcds},
instruction_sched_variant = {sched_variant},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing quotes ' '

@@ -145,7 +147,8 @@ def matmul_{configStr}(a, b, c, bias, M, N, K, am, ak, bk, bn, cm, cn, biasn):
BIAS = {use_bias},
EVEN_K = {EVEN_K},
GRID_MN = grid[0],
NUM_XCDS = {num_xcds}
NUM_XCDS = {num_xcds},
instruction_sched_variant = {sched_variant},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing quote ' '

config)

## {M}_{N}_{K} is removed since the same kernel can be used for differen gemm sizes
configStr = f"BM{block_m}_BN{block_n}_BK{block_k}_GM{group_m}_SK{split_k}_nW{num_warps}_nS{num_stages}_EU{waves_per_eu}_kP{kpack}_mfma{mfmaInstrSize}"
configStr = f"BM{block_m}_BN{block_n}_BK{block_k}_GM{group_m}_SK{split_k}_nW{num_warps}_nS{num_stages}_EU{waves_per_eu}_kP{kpack}_mfma{mfmaInstrSize}_sched{sched_variant[1:-1].upper()}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we are using local-prefetch, but we cannot have - in kernel names. Can you also convert - into _?

Copy link

@zhanglx13 zhanglx13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants