Skip to content

OpenMP offload

Ye Luo edited this page Aug 29, 2020 · 53 revisions

Welcome to the miniqmc for OpenMP offload wiki!

Build

Check out OMP_offload branch

git co OMP_offload

See build options in miniQMC How-to Guides.

We introduce a new option ENABLE_OFFLOAD in the current CMake setting to turn on/off offloading.

 -DENABLE_OFFLOAD=1  # offload to accelerators like GPU
 -DENABLE_OFFLOAD=0  # default, offload to CPU host

OFFLOAD_TARGET can be used to select a offload target if multiple targets are supported by the compiler, for example Clang and GNU.

Run

Offload feature is currently implemented on miniqmc miniapp. It accepts command line arguments -g, -w, -a, -m, -n

-g adjusts supercell size
-w number of walkers. Equal to the number of CPU threads if not specified.
-a tiling (cache blocking) size. Equal to the number of splines if not specified.
-m spline mesh "px py pz"
-n number of iterations

The old check_spo is renamed as check_spo_batched. The following option is only available with check_spo_batched

-f avoid transfer back data for checking. Must be used when measuring performance.

Benchmark example

OMP_NUM_THREADS=10 ./bin/miniqmc -g "2 2 1"

Build recipes

Update on Nov 17th 2019

IBM XL

Last verified on 16.1.1-5 cmake -DCMAKE_CXX_COMPILER=xlC_r -DENABLE_OFFLOAD=1 ..

With old version of CMake (<3.11), XL is identified as Clang. The following workaround solves the issue

cmake -DCMAKE_CXX_COMPILER=xlC_r -DCMAKE_CXX_COMPILER_ID='XL' -DENABLE_OFFLOAD=1 ..

LLVM Clang

Last verified on 11-RC2

cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=1 -D USE_OBJECT_TARGET=ON ..

-D USE_OBJECT_TARGET=ON is used to workaround static linking issue.

Intel OneAPI

Last verified on beta08

cmake -D CMAKE_CXX_COMPILER=icpx -D ENABLE_OFFLOAD=1 -D OFFLOAD_TARGET=spir64 -DCMAKE_EXE_LINKER_FLAGS="-device-math-lib=fp64,fp32" ..

On some systems, forcing LIBOMPTARGET_PLUGIN=OPENCL is needed at runtime.

AMD AOMP

Last verified on 11.8

cmake -D CMAKE_CXX_COMPILER=clang++ \
      -D ENABLE_OFFLOAD=1 \
      -D OFFLOAD_TARGET=amdgcn-amd-amdhsa \
      -D OFFLOAD_ARCH=gfx906 ..

GNU GCC

Last verified on 9.2

cmake -D CMAKE_CXX_COMPILER=g++ -D ENABLE_OFFLOAD=1 ..

Cray Clang

Last verified on 9.0

module load craype-accel-nvidia60
cmake -D CMAKE_CXX_COMPILER=CC -DQMC_MPI=1 ..

PGI

Not yet tested.

pass/fail dashboard

Compiler Clang 11 AOMP 11.8-0 XL 16.1.1-5 OneAPI beta08 Cray 9.0 GCC 10.2 GCC 10
device NVIDIA AMD NVIDIA Intel NVIDIA NVIDIA AMD
math header conflict Pass Pass Pass Pass Pass Pass -
complex arithmetic Pass Pass Pass Pass Fail - -
math linker error Pass Pass Pass Pass Fail Pass -
declare target static data Pass Pass Pass - Pass Fail -
static linking Fail Pass Pass Pass Pass - -
Async tasking Fail Fail Pass Fail Fail - -
multiple stream Pass Pass Pass Fail Fail - -
check_spo Pass Pass Pass Pass Pass - -
check_spo_batched Pass Pass Pass Pass Pass - -
miniqmc_sync_move Pass Pass Pass Pass Pass - -
P pass
F fail
FL fail in linking
FR fail in run
FW fail with wrong results
- not tested yet
Clone this wiki locally