[release/2.5][ROCm] Fix largeIndexBlockSize #1659

pragupta · 2024-10-30T16:30:09Z

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds.

While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min.

Example Code w/ failure:

D=6144
hidden_states = torch.zeros([16384, 6144],           device="cuda:0", dtype=torch.bfloat16)
index         = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64)
output        = torch.empty([1, 32, 16384, 6144],    device="cuda:0", dtype=torch.bfloat16)
hidden_states.index_add_(0, index.view(-1), output.view(-1, D))

Traceback (most recent call last):
RuntimeError: HIP error: invalid configuration argument

(cherry picked from commit c0266db)

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. In the meantime, use < operator to comapre. (cherry picked from commit c0266db)

pruthvistony · 2024-10-30T17:33:40Z

Not yet decided on cherry-picks into 2.5, so want to wait on this PR merge.

rocm-mici · 2024-10-30T19:07:19Z

Jenkins build for db33c0f8917630a279e142c898a5011bdef163a1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-mici · 2024-11-05T22:18:49Z

Jenkins build for db33c0f8917630a279e142c898a5011bdef163a1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-mici · 2024-11-07T21:13:44Z

Jenkins build for db33c0f8917630a279e142c898a5011bdef163a1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-mici · 2024-11-09T09:33:40Z

Jenkins build for db33c0f8917630a279e142c898a5011bdef163a1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7969/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ForeachBinaryOpScalar.hip.o
[7970/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_fused_adamw_amsgrad_impl.hip.o
[7971/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_scaled_modified_bessel_k1.hip.o
[7972/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ForeachBinaryOpScalarList.hip.o
[7973/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention.hip:84:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-11-13T01:26:46Z

Jenkins build for db33c0f8917630a279e142c898a5011bdef163a1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

/opt/rocm-6.2.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:580:41: note: expanded from macro 'DEPRECATED'
  580 | #define DEPRECATED(msg) __attribute__ ((deprecated(msg)))
      |                                         ^
1 warning generated when compiling for gfx908.
[8005/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/./torch_hip_generated_flash_api.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.hip:57:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-11-13T17:23:47Z

Jenkins build for db33c0f8917630a279e142c898a5011bdef163a1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7968/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ForeachBinaryOpScalarList.hip.o
[7969/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/torch_hip_generated_CUDASymmetricMemoryOps.cu.o
[7970/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_UnfoldBackwardKernel.hip.o
[7971/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_scaled_modified_bessel_k1.hip.o
[7972/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/./torch_hip_generated_flash_api.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.hip:57:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-11-15T07:25:00Z

Jenkins build for db33c0f8917630a279e142c898a5011bdef163a1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7963/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ForeachBinaryOpScalarList.hip.o
[7964/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/torch_hip_generated_CUDASymmetricMemoryOps.cu.o
[7965/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_modified_bessel_k0.hip.o
[7966/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ForeachBinaryOpScalar.hip.o
[7967/8668] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention.hip:84:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[release/2.5][ROCm] Fix largeIndexBlockSize

db33c0f

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. In the meantime, use < operator to comapre. (cherry picked from commit c0266db)

pragupta requested review from pruthvistony and jithunnair-amd October 30, 2024 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release/2.5][ROCm] Fix largeIndexBlockSize #1659

[release/2.5][ROCm] Fix largeIndexBlockSize #1659

pragupta commented Oct 30, 2024

pruthvistony commented Oct 30, 2024

rocm-mici commented Oct 30, 2024

rocm-mici commented Nov 5, 2024

rocm-mici commented Nov 7, 2024

rocm-mici commented Nov 9, 2024

rocm-mici commented Nov 13, 2024

rocm-mici commented Nov 13, 2024

rocm-mici commented Nov 15, 2024

[release/2.5][ROCm] Fix largeIndexBlockSize #1659

Are you sure you want to change the base?

[release/2.5][ROCm] Fix largeIndexBlockSize #1659

Conversation

pragupta commented Oct 30, 2024

pruthvistony commented Oct 30, 2024

rocm-mici commented Oct 30, 2024

rocm-mici commented Nov 5, 2024

rocm-mici commented Nov 7, 2024

rocm-mici commented Nov 9, 2024

rocm-mici commented Nov 13, 2024

rocm-mici commented Nov 13, 2024

rocm-mici commented Nov 15, 2024