Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: ROCm-6.1 test failures with upstream clang-17 #76

Closed
littlewu2508 opened this issue May 4, 2024 · 7 comments
Closed

[Issue]: ROCm-6.1 test failures with upstream clang-17 #76

littlewu2508 opened this issue May 4, 2024 · 7 comments
Labels
device-libs Related to Device Libraries Under Investigation

Comments

@littlewu2508
Copy link

Problem Description

4 tests failed on Gentoo rocm-device-libs-6.1.0 package, with clang-17.0.6:

          6 - compile_frexp__gfx600 (Failed)
          7 - compile_fract__gfx600 (Failed)
         12 - compile_fract__gfx700 (Failed)
         17 - compile_fract__gfx803 (Failed)

Test log:
LastTest.log

The test compilation output that are failing tests:
failed_assembly.tar.gz

Operating System

Gentoo Prefix on kernel 6.7.9

CPU

AMD Ryzen 7 7700 8-Core Processor

GPU

AMD Radeon RX 7900 XT

ROCm Version

ROCm 6.1.0

ROCm Component

ROCm-Device-Libs

Steps to Reproduce

## Setup ebuild repo, currently in my own branch
pushd /var/db/repos
rm -rf gentoo
git clone --depth 1 https://github.com/littlewu2508/gentoo.git -b rocm-runtime-6.1

## Testing setup
usermod –a –G render portage # add portage to render group to access GPU
mkdir -p /etc/portage/env/
echo 'FEATURES="test"' > /etc/portage/env/test.conf
echo 'dev-libs/rocm-device-libs test.conf' > /etc/portage/package.env

## Install deps and execute test
emerge -v "=dev-libs/rocm-device-libs-6.1.0"

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@lamb-j lamb-j added the device-libs Related to Device Libraries label May 6, 2024
@lamb-j lamb-j transferred this issue from ROCm/ROCm-Device-Libs May 6, 2024
@ppanchad-amd
Copy link

@littlewu2508 Failing tests are on gfx that are not supported on the latest ROCm 6.1.3. Closing ticket. Thanks!

@ppanchad-amd ppanchad-amd closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2024
@AngryLoki
Copy link

AngryLoki commented Aug 4, 2024

Hi @ppanchad-amd , I reproduced this issue with rocm-6.2.0 branch (only 3 tests failed):

          6 - compile_frexp__gfx600 (Failed)
          9 - compile_fract__gfx600 (Failed)
         14 - compile_fract__gfx700 (Failed)

While you are technically correct that gfx600 and others are not supported,
https://github.com/ROCm/llvm-project/blob/rocm-6.2.0/amd/device-libs/test/compile/CMakeLists.txt#L82 and code below runs tests (and fails) specifically on these architectures.

Is is possible to remove the faulty code or replace it with some supported architecture? Otherwise most of tests are enabled on gfx600/gfx700/gfx803/gfx900/gfx1030 where only gfx1030 is still supported (i. e. only lgamma_r is expected to work).

(please advice to create a new task or reopen this one)

@ppanchad-amd
Copy link

@littlewu2508 @AngryLoki Internal ticket has been created to fix this issue. Thanks!

@schung-amd
Copy link

Hi @AngryLoki, can you provide a log of the failing tests? I'm interested in seeing if it matches with the log provided by @littlewu2508 on ROCm 6.1.0. Also, do these test failures occur with upstream clang only, or are they also failing on your end with the packaged rocm-clang? Thanks!

@AngryLoki
Copy link

@schung-amd , here is my log (with upstream clang 18.1.8):
LastTest.log

These issues seem to be reproducible with official 6.2.0 builds (I launched compiler directly there, because there are no LLVM libs in official docker images and I don't want to rebuild llvm there for demo purposes)

docker run --rm -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined -v /llvm-project-rocm-6.2.0/:/src --group-add video rocm/rocm-terminal

rocm-user@1ed03f2dcc81:~$ /opt/rocm-6.2.0/lib/llvm/bin/clang -O3 -S -cl-std=CL2.0 -target amdgcn-amd-amdhsa -mcpu=gfx600 -Xclang -finclude-default-header --rocm-path=/src/amd/device-libs_build --rocm-device-lib-path=/src/amd/device-libs_build/lib/amdgcn/bitcode -mllvm -amdgpu-simplify-libcall=0 -o output.compile_frexp.gfx600.s /src/amd/device-libs/test/compile/frexp.cl
# search for `0x7f80000` in utput.compile_frexp.gfx600.s -- not found

rocm-user@1ed03f2dcc81:~$ /opt/rocm-6.2.0/lib/llvm/bin/clang -O3 -S -cl-std=CL2.0 -target amdgcn-amd-amdhsa -mcpu=gfx700 -Xclang -finclude-default-header --rocm-path=/src/amd/device-libs_build --rocm-device-lib-path=/src/amd/device-libs_build/lib/amdgcn/bitcode -mllvm -amdgpu-simplify-libcall=0 -o output.compile_fract.gfx700.s /src/amd/device-libs/test/compile/fract.cl
# search for `flat_load_ushort` in output.compile_fract.gfx700.s -- not found

rocm-user@1ed03f2dcc81:~$ /opt/rocm-6.2.0/lib/llvm/bin/clang -O3 -S -cl-std=CL2.0 -target amdgcn-amd-amdhsa -mcpu=gfx600 -Xclang -finclude-default-header --rocm-path=/src/amd/device-libs_build --rocm-device-lib-path=/src/amd/device-libs_build/lib/amdgcn/bitcode -mllvm -amdgpu-simplify-libcall=0 -o output.compile_fract.gfx600.s /src/amd/device-libs/test/compile/fract.cl
# search for `v_cvt_f32_f16` in output.compile_fract.gfx600.s -- not found

Looks like all 3 issues are related to v_cmp_class-family instructions. I tried to track the first issue with frexp.cl and found with godbolt that this test probably regressed between 6.0.2 and 6.1.2 releases:

__attribute__((global)) void test1(float x, bool *out) {
    // 6.0.2 produces abs(x) == 0x7f800000 (infinity)
    // 6.1.2 produces v_cmp_class(x)
    *out = !__builtin_isfinite(x);
}

__attribute__((global)) void test2(float x, bool *out) {
    // both 6.0.2 and 6.1.2 produce v_cmp_class(x)
    *out = __builtin_isfinite(x);
}

see https://godbolt.org/z/Gv59Y84r4

@schung-amd
Copy link

Thanks, I was able to reproduce this on my end as well. I've reached out to the internal team handling these tests, and we're discussing how to handle this moving forward.

@schung-amd
Copy link

Addressed, but this might break again in the future as we don't currently have good automation for updating these tests. Feel free to open a new issue if/when that occurs. Thanks for bringing this to our attention!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
device-libs Related to Device Libraries Under Investigation
Projects
None yet
Development

No branches or pull requests

5 participants