Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the bug that for block_k=16 mma, the compilation crash on Ampere. #4768

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Commits on Sep 20, 2024

  1. fix the bug that for block_k=16 mma, the compilation crash on Ampere.

    The origin issue is reported here: triton-lang#3435 The issue happens during compilation, when arith.sitofp (from i8 to fp16) operates on the tensor operand which has dot_op layout with the first dimension of the tensor being 16 and opidx = 1. For example: %104 = arith.sitofp %103 : tensor<16x64xi8, #triton_gpu.dot_op<{opIdx = 1, parent = #mma, kWidth = 4}>> to tensor<16x64xf16, #triton_gpu.dot_op<{opIdx = 1, parent = #mma, kWidth = 4}>> Investigation shows that the bug happens in TritonGPUToLLVM pass. in the corner case (block_k = 16 and opidx = 1) extra elements will be unpacked in include/triton/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.h:line 186-194. The code unpack extra elements due to an implicit assumption in lib/Dialect/TritonGPU/IR/Dialect.h, at line 2000, at least 4 rep will be loaded. Therefore, in our patch, extra loaded elements are dropped in the corner case.
    bingyizh233 committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    8b8ceeb View commit details
    Browse the repository at this point in the history
  2. Add new test cases in

    test/Conversion/tritongpu_to_llvm_ampere.mlir.TYhis
    
    This is a the test case for converting tensor from s8 to fp16 when the
    first dimension of the tensor==16 and opidx==1. Previously, the
    compilation could crash during convert-triton-gpu-llvm on Nvidia Ampere
    GPU. The new code patch resolves the issue. This new test case is to
    verify the crash does not exist.
    bingyizh233 committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    d57cb60 View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2024

  1. Add new test cases in python/test/unit/ampere/test_gemm_mixed_dtype.py

        This is a the test case for converting tensor from s8 to fp16 when the
        first dimension of the tensor==16 and opidx==1. Previously, the
        compilation could crash during convert-triton-gpu-llvm on Nvidia Ampere
        GPU. The new code patch resolves the issue. This new test case is to
        verify the crash does not exist
    bingyizh233 committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    e9a5eab View commit details
    Browse the repository at this point in the history
  2. Add new test cases in python/test/unit/ampere/test_gemm_mixed_dtype.py

            This is a the test case for converting tensor from s8 to fp16 when the
            first dimension of the tensor==16 and opidx==1. Previously, the
            compilation could crash during convert-triton-gpu-llvm on Nvidia Ampere
            GPU. The new code patch resolves the issue. This new test case is to
            verify the crash does not exist
    bingyizh233 committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    f8f31ca View commit details
    Browse the repository at this point in the history

Commits on Sep 23, 2024

  1. Configuration menu
    Copy the full SHA
    c6407ac View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2024

  1. Configuration menu
    Copy the full SHA
    b70082f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2c63c38 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d37d0b7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    3b85c79 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    68f49dc View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2024

  1. Configuration menu
    Copy the full SHA
    484258d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b51a799 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    abede90 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9a6682e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8ec8b8c View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2024

  1. Configuration menu
    Copy the full SHA
    1006afb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    bcbd1f7 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2024

  1. Merge branch 'main' into main

    chsigg authored Sep 27, 2024
    Configuration menu
    Copy the full SHA
    2635acd View commit details
    Browse the repository at this point in the history

Commits on Oct 29, 2024

  1. Configuration menu
    Copy the full SHA
    55a2e45 View commit details
    Browse the repository at this point in the history