[Issue]: Cannot compile Fortran do concurrent for AMD GPU with new AMDFLANG compiler #214

sumseq · 2024-11-27T00:22:39Z

Problem Description

I am trying to use the new "AMD Modern Fortran Compiler" described here:
https://github.com/amd/InfinityHub-CI/tree/main/fortran
on my code that uses "do concurrent" for GPU-offload with optional OpenMP Target data movement (for GPUs/compiler that do not support unified memory).

The code is "HipFT" located publicly here:
github.com/predsci/hipft

The code works on NVIDIA GPUs with nvfortran and HPE, and on Intel GPUs with ifx.
It also compiles and runs on AMD server GPUs with HPE's CCE compiler (see https://arxiv.org/pdf/2408.07843)

I have compiled HDF5 1.14.3 (with a configure fix) and OpenMPI 5.0.6 with the amdflang and amdclang compiler to link to the code.

When I try to compiler with:
-O3 -fopenmp -fdo-concurrent-parallel=device --offload-arch=gfx906
I get:

LLVM ERROR: aborting  
make: *** [Makefile:25: hipft.o] Error 1

I am using 'mpif90' to compile the code which is using the amdflang:

$ mpif90 -show
amdflang -I/opt/psi/amd/ext_deps/deps/openmpi-5.0.6/include -I/opt/psi/amd/ext_deps/deps/openmpi-5.0.6/lib -L/opt/psi/amd/ext_deps/deps/openmpi-5.0.6/lib -Wl,-rpath -Wl,/opt/psi/amd/ext_deps/deps/openmpi-5.0.6/lib -Wl,--enable-new-dtags -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi

If I try to compile without any OpenMP or Do Concurrent flags, the code compiles fine and runs correctly on 1 CPU core.

If I try to compile with just openmp turned on, and "do concurrent" set to host I get a lot of serialization warnings:
warning: loc("/home/caplanr/hipft/git_amd/src/hipft.f90":7683:7): Some do concurrent loops are not perfectly-nested. These will be serialzied.
These concern me since if I cannot use DC with index ranges like "2:N-1" than I doubt the code will parallelize at all on either the GPU or CPU since a LOT of the loops are like that.

Note I also had to use: -L/opt/amdfort/llvm/lib -lomptarget in this case otherwise it cannot find the OpenMP target data movement symbols (although they should not be being used in this case....).

Any help would be appreciated as I plan to present the code at SIAM's CSE meeting in a few months and would really like to have some AMD results.

-- Ron

Operating System

Rocky Linux 9.5 (Blue Onyx)

CPU

Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

GPU

AMD Radeon VII, gfx906, amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-, , amdgcn-amd-amdhsa--gfx9-generic:sramecc+:xnack-

ROCm Version

ROCm 6.2.3

ROCm Component

flang

Steps to Reproduce

My rocm is actually 6.2.4, but that is not on the list.
My linux kernel is: edge 6.10.6-1.el9.elrepo.x86_64

To reproduce, install the new AMD flang compiler from:
https://github.com/amd/InfinityHub-CI/tree/main/fortran

Next, clone the repo:

git clone https://github.com/predsci/hipft

Then, copy one of the build scripts from the build_examples folder and edit the top portion to resemble this:

FC="mpif90"
HDF5_INCLUDE_DIR="${PS_EXT_DEPS_HOME}/hdf5/include"
HDF5_LIB_DIR="${PS_EXT_DEPS_HOME}/hdf5/lib"
HDF5_LIB_FLAGS="-lhdf5_fortran -lhdf5hl_fortran -lhdf5 -lhdf5_hl"
FFLAGS="-O3 -fopenmp --offload-arch=gfx906 -fdo-concurrent-parallel=device"

But:

Replace the HDF5 paths with the ones to a HDF5 library compiled with amdflang.
The mpif90 should also be associated with an MPI library compiled with amdflang.
Replace the gfx906 with the correct GPU arch you are using.

Now, try to run the build script in the top level directory of the repo.

You should see:

./build_amd_gpu.sh
=== STARTING HIPFT BUILD ===
==> Entering src directory...
==> Removing old Makefile...
==> Generating Makefile from Makefile.template...
==> Compiling code...
!!> ERROR!  hipft executable not found.  Build most likely failed.
            Contents of src/build.err:
LLVM ERROR: aborting
make: *** [Makefile:25: hipft.o] Error 1

You can go into the src folder and try to edit the Makefile and recompile as needed.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

$ /opt/rocm/bin/rocminfo --support
ROCk module is loaded

HSA System Attributes

Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES

==========
HSA Agents

Agent 1

Name: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4000
BDFID: 0
Internal Node ID: 0
Compute Unit: 6
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32508640(0x1f00ae0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 32508640(0x1f00ae0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32508640(0x1f00ae0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32508640(0x1f00ae0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:

Agent 2

Name: gfx906
Uuid: GPU-b86490a172da5ee9
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 3584
Internal Node ID: 1
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 472
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 1676083(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 1676083(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
ISA 2
Name: amdgcn-amd-amdhsa--gfx9-generic:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

Additional Information

Here are my installed amd and rocm packages:

$ sudo dnf list --installed | grep amd
50:amd-smi-lib.x86_64                                     24.6.3.60204-139.el9                @rocm                      
51:amdgpu-core.noarch                                     1:6.2.60204-2070768.el9             @amdgpu                    
52:amdgpu-dkms.noarch                                     1:6.8.5.60204-2070768.el9           @amdgpu                    
53:amdgpu-dkms-firmware.noarch                            1:6.8.5.60204-2070768.el9           @amdgpu                    
54:amdgpu-install.noarch                                  6.2.60204-2070768.el9               @@commandline              
280:dkms.noarch                                            3.1.0-2.el9                         @amdgpu                    
569:hip-runtime-amd.x86_64                                 6.2.41134.60204-139.el9             @rocm                      
592:hsa-amd-aqlprofile.x86_64                              1.0.0.60204.60204-139.el9           @rocm                      
990:libdrm-amdgpu.x86_64                                   1:2.4.120.60204-2070768.el9         @amdgpu                    
991:libdrm-amdgpu-common.noarch                            1.0.0.60204-2070768.el9             @amdgpu                    
992:libdrm-amdgpu-devel.x86_64                             1:2.4.120.60204-2070768.el9         @amdgpu                    
1458:mesa-amdgpu-dri-drivers.x86_64                         1:24.2.0.60204-2070768.el9          @amdgpu                    
1459:mesa-amdgpu-filesystem.x86_64                          1:24.2.0.60204-2070768.el9          @amdgpu                    
1460:mesa-amdgpu-libGL.x86_64                               1:24.2.0.60204-2070768.el9          @amdgpu                    
1461:mesa-amdgpu-va-drivers.x86_64                          1:24.2.0.60204-2070768.el9          @amdgpu                    
2230:teamd.x86_64                                           1.31-16.el9_1                       @baseos

$ sudo dnf list --installed | grep rocm
50:amd-smi-lib.x86_64                                     24.6.3.60204-139.el9                @rocm                      
178:comgr.x86_64                                           2.8.0.60204-139.el9                 @rocm                      
179:composablekernel-devel.x86_64                          1.1.0.60204-139.el9                 @rocm                      
558:half.x86_64                                            1.12.0.60204-139.el9                @rocm                      
567:hip-devel.x86_64                                       6.2.41134.60204-139.el9             @rocm                      
568:hip-doc.x86_64                                         6.2.41134.60204-139.el9             @rocm                      
569:hip-runtime-amd.x86_64                                 6.2.41134.60204-139.el9             @rocm                      
570:hip-samples.x86_64                                     6.2.41134.60204-139.el9             @rocm                      
571:hipblas.x86_64                                         2.2.0.60204-139.el9                 @rocm                      
572:hipblas-devel.x86_64                                   2.2.0.60204-139.el9                 @rocm                      
573:hipblaslt.x86_64                                       0.8.0.60204-139.el9                 @rocm                      
574:hipblaslt-devel.x86_64                                 0.8.0.60204-139.el9                 @rocm                      
575:hipcc.x86_64                                           1.1.1.60204-139.el9                 @rocm                      
576:hipcub-devel.x86_64                                    3.2.1.60204-139.el9                 @rocm                      
577:hipfft.x86_64                                          1.0.16.60204-139.el9                @rocm                      
578:hipfft-devel.x86_64                                    1.0.16.60204-139.el9                @rocm                      
579:hipfort-devel.x86_64                                   0.4.0.60204-139.el9                 @rocm                      
580:hipify-clang.x86_64                                    18.0.0.60204-139.el9                @rocm                      
581:hiprand.x86_64                                         2.11.1.60204-139.el9                @rocm                      
582:hiprand-devel.x86_64                                   2.11.1.60204-139.el9                @rocm                      
583:hipsolver.x86_64                                       2.2.0.60204-139.el9                 @rocm                      
584:hipsolver-devel.x86_64                                 2.2.0.60204-139.el9                 @rocm                      
585:hipsparse.x86_64                                       3.1.1.60204-139.el9                 @rocm                      
586:hipsparse-devel.x86_64                                 3.1.1.60204-139.el9                 @rocm                      
587:hipsparselt.x86_64                                     0.2.1.60204-139.el9                 @rocm                      
588:hipsparselt-devel.x86_64                               0.2.1.60204-139.el9                 @rocm                      
589:hiptensor.x86_64                                       1.3.0.60204-139.el9                 @rocm                      
590:hiptensor-devel.x86_64                                 1.3.0.60204-139.el9                 @rocm                      
592:hsa-amd-aqlprofile.x86_64                              1.0.0.60204.60204-139.el9           @rocm                      
593:hsa-rocr.x86_64                                        1.14.0.60204-139.el9                @rocm                      
594:hsa-rocr-devel.x86_64                                  1.14.0.60204-139.el9                @rocm                      
595:hsakmt-roct-devel.x86_64                               20240607.5.7.60204-139.el9          @rocm                      
1477:migraphx.x86_64                                        2.10.0.60204-139.el9                @rocm                      
1478:migraphx-devel.x86_64                                  2.10.0.60204-139.el9                @rocm                      
1480:miopen-hip.x86_64                                      3.2.0.60204-139.el9                 @rocm                      
1481:miopen-hip-devel.x86_64                                3.2.0.60204-139.el9                 @rocm                      
1482:mivisionx.x86_64                                       3.0.0.60204-139                     @rocm                      
1483:mivisionx-devel.x86_64                                 3.0.0.60204-139                     @rocm                      
1574:openmp-extras-devel.x86_64                             18.62.0.60204-139.el9               @rocm                      
1575:openmp-extras-runtime.x86_64                           18.62.0.60204-139.el9               @rocm                      
2048:rccl.x86_64                                            2.20.5.60204-139.el9                @rocm                      
2049:rccl-devel.x86_64                                      2.20.5.60204-139.el9                @rocm                      
2058:rocalution.x86_64                                      3.2.1.60204-139.el9                 @rocm                      
2059:rocalution-devel.x86_64                                3.2.1.60204-139.el9                 @rocm                      
2060:rocblas.x86_64                                         4.2.4.60204-139.el9                 @rocm                      
2061:rocblas-devel.x86_64                                   4.2.4.60204-139.el9                 @rocm                      
2062:rocdecode.x86_64                                       0.6.0.60204-139                     @rocm                      
2063:rocdecode-devel.x86_64                                 0.6.0.60204-139                     @rocm                      
2064:rocfft.x86_64                                          1.0.30.60204-139.el9                @rocm                      
2065:rocfft-devel.x86_64                                    1.0.30.60204-139.el9                @rocm                      
2073:rocm.x86_64                                            6.2.4.60204-139.el9                 @rocm                      
2074:rocm-cmake.x86_64                                      0.13.0.60204-139.el9                @rocm                      
2075:rocm-core.x86_64                                       6.2.4.60204-139.el9                 @rocm                      
2076:rocm-dbgapi.x86_64                                     0.76.0.60204-139.el9                @rocm                      
2077:rocm-debug-agent.x86_64                                2.0.3.60204-139.el9                 @rocm                      
2078:rocm-developer-tools.x86_64                            6.2.4.60204-139.el9                 @rocm                      
2079:rocm-device-libs.x86_64                                1.0.0.60204-139.el9                 @rocm                      
2080:rocm-gdb.x86_64                                        14.2.60204-139.el9                  @rocm                      
2081:rocm-hip-libraries.x86_64                              6.2.4.60204-139.el9                 @rocm                      
2082:rocm-hip-runtime.x86_64                                6.2.4.60204-139.el9                 @rocm                      
2083:rocm-hip-runtime-devel.x86_64                          6.2.4.60204-139.el9                 @rocm                      
2084:rocm-hip-sdk.x86_64                                    6.2.4.60204-139.el9                 @rocm                      
2085:rocm-language-runtime.x86_64                           6.2.4.60204-139.el9                 @rocm                      
2086:rocm-llvm.x86_64                                       18.0.0.24392.60204-139.el9          @rocm                      
2087:rocm-ml-libraries.x86_64                               6.2.4.60204-139.el9                 @rocm                      
2088:rocm-ml-sdk.x86_64                                     6.2.4.60204-139.el9                 @rocm                      
2089:rocm-opencl.x86_64                                     2.0.0.60204-139.el9                 @rocm                      
2090:rocm-opencl-devel.x86_64                               2.0.0.60204-139.el9                 @rocm                      
2091:rocm-opencl-icd-loader.x86_64                          1.2.60204-139.el9                   @rocm                      
2092:rocm-opencl-runtime.x86_64                             6.2.4.60204-139.el9                 @rocm                      
2093:rocm-opencl-sdk.x86_64                                 6.2.4.60204-139.el9                 @rocm                      
2094:rocm-openmp-sdk.x86_64                                 6.2.4.60204-139.el9                 @rocm                      
2095:rocm-smi-lib.x86_64                                    7.3.0.60204-139.el9                 @rocm                      
2096:rocm-utils.x86_64                                      6.2.4.60204-139.el9                 @rocm                      
2097:rocminfo.x86_64                                        1.0.0.60204-139.el9                 @rocm                      
2098:rocprim-devel.x86_64                                   3.2.2.60204-139.el9                 @rocm                      
2099:rocprofiler.x86_64                                     2.0.60204.60204-139.el9             @rocm                      
2100:rocprofiler-devel.x86_64                               2.0.60204.60204-139.el9             @rocm                      
2101:rocprofiler-plugins.x86_64                             2.0.60204.60204-139.el9             @rocm                      
2102:rocprofiler-register.x86_64                            0.4.0.60204-139.el9                 @rocm                      
2103:rocprofiler-sdk.x86_64                                 0.4.0-139.el9                       @rocm                      
2104:rocprofiler-sdk-roctx.x86_64                           0.4.0-139.el9                       @rocm                      
2105:rocrand.x86_64                                         3.1.1.60204-139.el9                 @rocm                      
2106:rocrand-devel.x86_64                                   3.1.1.60204-139.el9                 @rocm                      
2107:rocsolver.x86_64                                       3.26.2.60204-139.el9                @rocm                      
2108:rocsolver-devel.x86_64                                 3.26.2.60204-139.el9                @rocm                      
2109:rocsparse.x86_64                                       3.2.1.60204-139.el9                 @rocm                      
2110:rocsparse-devel.x86_64                                 3.2.1.60204-139.el9                 @rocm                      
2111:rocthrust-devel.x86_64                                 3.1.1.60204-139.el9                 @rocm                      
2112:roctracer.x86_64                                       4.1.60204.60204-139.el9             @rocm                      
2113:roctracer-devel.x86_64                                 4.1.60204.60204-139.el9             @rocm                      
2114:rocwmma-devel.x86_64                                   1.5.0.60204-139.el9                 @rocm                      
2127:rpp.x86_64                                             1.8.0.60204-139.el9                 @rocm                      
2128:rpp-devel.x86_64                                       1.8.0.60204-139.el9                 @rocm

The text was updated successfully, but these errors were encountered:

bcornille added the flang label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Cannot compile Fortran do concurrent for AMD GPU with new AMDFLANG compiler #214

[Issue]: Cannot compile Fortran do concurrent for AMD GPU with new AMDFLANG compiler #214

sumseq commented Nov 27, 2024

[Issue]: Cannot compile Fortran do concurrent for AMD GPU with new AMDFLANG compiler #214

[Issue]: Cannot compile Fortran do concurrent for AMD GPU with new AMDFLANG compiler #214

Comments

sumseq commented Nov 27, 2024

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

$ /opt/rocm/bin/rocminfo --support ROCk module is loaded

HSA System Attributes

========== HSA Agents

Additional Information

$ /opt/rocm/bin/rocminfo --support
ROCk module is loaded

==========
HSA Agents