OpenMP offload

Welcome to the miniqmc for OpenMP offload wiki!

Build

Check out OMP_offload branch

git co OMP_offload

See build options in miniQMC How-to Guides.

We introduce a new option ENABLE_OFFLOAD in the current CMake setting to turn on/off offloading.

 -DENABLE_OFFLOAD=1  # offload to accelerators like GPU
 -DENABLE_OFFLOAD=0  # default, offload to CPU host

OFFLOAD_TARGET can be used to select a offload target if multiple targets are supported by the compiler, for example Clang and GNU.

Run

Offload feature is currently implemented on miniqmc miniapp. It accepts command line arguments -g, -w, -a, -m, -n

-g adjusts supercell size
-w number of walkers. Equal to the number of CPU threads if not specified.
-a tiling (cache blocking) size. Equal to the number of splines if not specified.
-m spline mesh "px py pz"
-n number of iterations

The old check_spo is renamed as check_spo_batched. The following option is only available with check_spo_batched

-f avoid transfer back data for checking. Must be used when measuring performance.

Benchmark example

OMP_NUM_THREADS=10 ./bin/miniqmc -g "2 2 1"

Build recipes

Update on Nov 17th 2019

IBM XL

Last verified on 16.1.1-5 cmake -DCMAKE_CXX_COMPILER=xlC_r -DENABLE_OFFLOAD=1 ..

With old version of CMake (<3.11), XL is identified as Clang. The following workaround solves the issue

cmake -DCMAKE_CXX_COMPILER=xlC_r -DCMAKE_CXX_COMPILER_ID='XL' -DENABLE_OFFLOAD=1 ..

LLVM Clang

Last verified on 11-RC2

cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=1 -D USE_OBJECT_TARGET=ON ..

-D USE_OBJECT_TARGET=ON is used to workaround static linking issue.

Intel OneAPI

Last verified on beta08

cmake -D CMAKE_CXX_COMPILER=icpx -D ENABLE_OFFLOAD=1 -D OFFLOAD_TARGET=spir64 -DCMAKE_EXE_LINKER_FLAGS="-device-math-lib=fp64,fp32" ..

On some systems, forcing LIBOMPTARGET_PLUGIN=OPENCL is needed at runtime.

AMD AOMP

Last verified on 11.8

cmake -D CMAKE_CXX_COMPILER=clang++ \
      -D ENABLE_OFFLOAD=1 \
      -D OFFLOAD_TARGET=amdgcn-amd-amdhsa \
      -D OFFLOAD_ARCH=gfx906 ..

GNU GCC

Last verified on 9.2

cmake -D CMAKE_CXX_COMPILER=g++ -D ENABLE_OFFLOAD=1 ..

Cray Clang

Last verified on 9.0

module load craype-accel-nvidia60
cmake -D CMAKE_CXX_COMPILER=CC -DQMC_MPI=1 ..

PGI

Not yet tested.

pass/fail dashboard

Compiler	Clang 11	AOMP 11.8-0	XL 16.1.1-5	OneAPI beta08	Cray 9.0	GCC 10.2	GCC 10
device	NVIDIA	AMD	NVIDIA	Intel	NVIDIA	NVIDIA	AMD
math header conflict	Pass	Pass	Pass	Pass	Pass	Pass	-
complex arithmetic	Pass	Pass	Pass	Pass	Fail	-	-
math linker error	Pass	Pass	Pass	Pass	Fail	Pass	-
declare target static data	Pass	Pass	Pass	-	Pass	Fail	-
static linking	Fail	Pass	Pass	Pass	Pass	-	-
Async tasking	Fail	Fail	Pass	Fail	Fail	-	-
multiple stream	Pass	Pass	Pass	Fail	Fail	-	-
check_spo	Pass	Pass	Pass	Pass	Pass	-	-
check_spo_batched	Pass	Pass	Pass	Pass	Pass	-	-
miniqmc_sync_move	Pass	Pass	Pass	Pass	Pass	-	-

AOMP 0.7-5 or maybe ROCm has regression and caused linux kernel segfault. https://github.com/ROCm-Developer-Tools/aomp/issues/45
Cray 9.1 inherits Clang 9 math function issues.
GNU 9.2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92553

P pass
F fail
FL fail in linking
FR fail in run
FW fail with wrong results
- not tested yet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly