fix the bug that for block_k=16 mma, the compilation crash on Ampere. #4768

The origin issue is reported here: triton-lang#3435 The issue happens during compilation, when arith.sitofp (from i8 to fp16) operates on the tensor operand which has dot_op layout with the first dimension of the tensor being 16 and opidx = 1. For example: %104 = arith.sitofp %103 : tensor<16x64xi8, #triton_gpu.dot_op<{opIdx = 1, parent = #mma, kWidth = 4}>> to tensor<16x64xf16, #triton_gpu.dot_op<{opIdx = 1, parent = #mma, kWidth = 4}>> Investigation shows that the bug happens in TritonGPUToLLVM pass. in the corner case (block_k = 16 and opidx = 1) extra elements will be unpacked in include/triton/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.h:line 186-194. The code unpack extra elements due to an implicit assumption in lib/Dialect/TritonGPU/IR/Dialect.h, at line 2000, at least 4 rep will be loaded. Therefore, in our patch, extra loaded elements are dropped in the corner case.

test/Conversion/tritongpu_to_llvm_ampere.mlir.TYhis This is a the test case for converting tensor from s8 to fp16 when the first dimension of the tensor==16 and opidx==1. Previously, the compilation could crash during convert-triton-gpu-llvm on Nvidia Ampere GPU. The new code patch resolves the issue. This new test case is to verify the crash does not exist.

This is a the test case for converting tensor from s8 to fp16 when the first dimension of the tensor==16 and opidx==1. Previously, the compilation could crash during convert-triton-gpu-llvm on Nvidia Ampere GPU. The new code patch resolves the issue. This new test case is to verify the crash does not exist

…type_small_tile.py

…e.py

Commits on Sep 27, 2024

Merge branch 'main' into main

chsigg authored Sep 27, 2024

Configuration menu

View commit details

Copy full SHA for 2635acd

Browse repository at this point

Copy the full SHA

2635acd View commit details

Browse the repository at this point in the history

Commits on Oct 29, 2024

Merge branch 'main' into main

bingyizh233 authored Oct 29, 2024

Configuration menu

View commit details

Copy full SHA for 55a2e45

Browse repository at this point

Copy the full SHA

55a2e45 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the bug that for block_k=16 mma, the compilation crash on Ampere. #4768

fix the bug that for block_k=16 mma, the compilation crash on Ampere. #4768

Commits on Sep 20, 2024

Commits on Sep 21, 2024

Commits on Sep 23, 2024

Commits on Sep 24, 2024

Commits on Sep 25, 2024

Commits on Sep 26, 2024

Commits on Sep 27, 2024

Commits on Oct 29, 2024

fix the bug that for block_k=16 mma, the compilation crash on Ampere. #4768

Are you sure you want to change the base?

fix the bug that for block_k=16 mma, the compilation crash on Ampere. #4768

Commits on Sep 20, 2024

Commits on Sep 21, 2024

Commits on Sep 23, 2024

Commits on Sep 24, 2024

Commits on Sep 25, 2024

Commits on Sep 26, 2024

Commits on Sep 27, 2024

Commits on Oct 29, 2024