-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix the bug that for block_k=16 mma, the compilation crash on Ampere. #4768
base: main
Are you sure you want to change the base?
Commits on Sep 20, 2024
-
fix the bug that for block_k=16 mma, the compilation crash on Ampere.
The origin issue is reported here: triton-lang#3435 The issue happens during compilation, when arith.sitofp (from i8 to fp16) operates on the tensor operand which has dot_op layout with the first dimension of the tensor being 16 and opidx = 1. For example: %104 = arith.sitofp %103 : tensor<16x64xi8, #triton_gpu.dot_op<{opIdx = 1, parent = #mma, kWidth = 4}>> to tensor<16x64xf16, #triton_gpu.dot_op<{opIdx = 1, parent = #mma, kWidth = 4}>> Investigation shows that the bug happens in TritonGPUToLLVM pass. in the corner case (block_k = 16 and opidx = 1) extra elements will be unpacked in include/triton/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.h:line 186-194. The code unpack extra elements due to an implicit assumption in lib/Dialect/TritonGPU/IR/Dialect.h, at line 2000, at least 4 rep will be loaded. Therefore, in our patch, extra loaded elements are dropped in the corner case.
Configuration menu - View commit details
-
Copy full SHA for 8b8ceeb - Browse repository at this point
Copy the full SHA 8b8ceebView commit details -
test/Conversion/tritongpu_to_llvm_ampere.mlir.TYhis This is a the test case for converting tensor from s8 to fp16 when the first dimension of the tensor==16 and opidx==1. Previously, the compilation could crash during convert-triton-gpu-llvm on Nvidia Ampere GPU. The new code patch resolves the issue. This new test case is to verify the crash does not exist.
Configuration menu - View commit details
-
Copy full SHA for d57cb60 - Browse repository at this point
Copy the full SHA d57cb60View commit details
Commits on Sep 21, 2024
-
Add new test cases in python/test/unit/ampere/test_gemm_mixed_dtype.py
This is a the test case for converting tensor from s8 to fp16 when the first dimension of the tensor==16 and opidx==1. Previously, the compilation could crash during convert-triton-gpu-llvm on Nvidia Ampere GPU. The new code patch resolves the issue. This new test case is to verify the crash does not exist
Configuration menu - View commit details
-
Copy full SHA for e9a5eab - Browse repository at this point
Copy the full SHA e9a5eabView commit details -
Add new test cases in python/test/unit/ampere/test_gemm_mixed_dtype.py
This is a the test case for converting tensor from s8 to fp16 when the first dimension of the tensor==16 and opidx==1. Previously, the compilation could crash during convert-triton-gpu-llvm on Nvidia Ampere GPU. The new code patch resolves the issue. This new test case is to verify the crash does not exist
Configuration menu - View commit details
-
Copy full SHA for f8f31ca - Browse repository at this point
Copy the full SHA f8f31caView commit details
Commits on Sep 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c6407ac - Browse repository at this point
Copy the full SHA c6407acView commit details
Commits on Sep 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b70082f - Browse repository at this point
Copy the full SHA b70082fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c63c38 - Browse repository at this point
Copy the full SHA 2c63c38View commit details -
Configuration menu - View commit details
-
Copy full SHA for d37d0b7 - Browse repository at this point
Copy the full SHA d37d0b7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b85c79 - Browse repository at this point
Copy the full SHA 3b85c79View commit details -
Configuration menu - View commit details
-
Copy full SHA for 68f49dc - Browse repository at this point
Copy the full SHA 68f49dcView commit details
Commits on Sep 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 484258d - Browse repository at this point
Copy the full SHA 484258dView commit details -
update comment; merge the pytest implementations in test_gemm_mixed_d…
…type_small_tile.py
Configuration menu - View commit details
-
Copy full SHA for b51a799 - Browse repository at this point
Copy the full SHA b51a799View commit details -
Configuration menu - View commit details
-
Copy full SHA for abede90 - Browse repository at this point
Copy the full SHA abede90View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9a6682e - Browse repository at this point
Copy the full SHA 9a6682eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ec8b8c - Browse repository at this point
Copy the full SHA 8ec8b8cView commit details
Commits on Sep 26, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1006afb - Browse repository at this point
Copy the full SHA 1006afbView commit details -
Configuration menu - View commit details
-
Copy full SHA for bcbd1f7 - Browse repository at this point
Copy the full SHA bcbd1f7View commit details
Commits on Sep 27, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 2635acd - Browse repository at this point
Copy the full SHA 2635acdView commit details
Commits on Oct 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 55a2e45 - Browse repository at this point
Copy the full SHA 55a2e45View commit details