-
Notifications
You must be signed in to change notification settings - Fork 35
OpenMP offload
Welcome to the miniqmc for OpenMP offload wiki!
Check out OMP_offload
branch
git co OMP_offload
See build options in miniQMC How-to Guides.
We introduce a new option ENABLE_OFFLOAD
in the current CMake setting to turn on/off offloading.
-DENABLE_OFFLOAD=1 # offload to accelerators like GPU
-DENABLE_OFFLOAD=0 # default, offload to CPU host
OFFLOAD_TARGET
can be used to select a offload target if multiple targets are supported by the compiler, for example Clang and GNU.
Offload feature is currently implemented on miniqmc
miniapp.
It accepts command line arguments -g, -w, -a, -m, -n
-g adjusts supercell size
-w number of walkers. Equal to the number of CPU threads if not specified.
-a tiling (cache blocking) size. Equal to the number of splines if not specified.
-m spline mesh "px py pz"
-n number of iterations
The old check_spo is renamed as check_spo_batched. The following option is only available with check_spo_batched
-f avoid transfer back data for checking. Must be used when measuring performance.
OMP_NUM_THREADS=10 ./bin/miniqmc -g "2 2 1"
Update on Nov 17th 2019
Last verified on 16.1.1-5
cmake -DCMAKE_CXX_COMPILER=xlC_r -DENABLE_OFFLOAD=1 ..
With old version of CMake (<3.11), XL is identified as Clang. The following workaround solves the issue
cmake -DCMAKE_CXX_COMPILER=xlC_r -DCMAKE_CXX_COMPILER_ID='XL' -DENABLE_OFFLOAD=1 ..
Last verified on 11-RC2
cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=1 -D USE_OBJECT_TARGET=ON ..
-D USE_OBJECT_TARGET=ON
is used to workaround static linking issue.
Last verified on beta08
cmake -D CMAKE_CXX_COMPILER=icpx -D ENABLE_OFFLOAD=1 -D OFFLOAD_TARGET=spir64 -DCMAKE_EXE_LINKER_FLAGS="-device-math-lib=fp64,fp32" ..
On some systems, forcing LIBOMPTARGET_PLUGIN=OPENCL is needed at runtime.
Last verified on 11.8
cmake -D CMAKE_CXX_COMPILER=clang++ \
-D ENABLE_OFFLOAD=1 \
-D OFFLOAD_TARGET=amdgcn-amd-amdhsa \
-D OFFLOAD_ARCH=gfx906 ..
Last verified on 9.2
cmake -D CMAKE_CXX_COMPILER=g++ -D ENABLE_OFFLOAD=1 ..
Last verified on 9.0
module load craype-accel-nvidia60
cmake -D CMAKE_CXX_COMPILER=CC -DQMC_MPI=1 ..
Not yet tested.
Compiler | Clang 11 | AOMP 11.8-0 | XL 16.1.1-5 | OneAPI beta08 | Cray 9.0 | GCC 10.2 | GCC 10 |
---|---|---|---|---|---|---|---|
device | NVIDIA | AMD | NVIDIA | Intel | NVIDIA | NVIDIA | AMD |
math header conflict | Pass | Pass | Pass | Pass | Pass | Pass | - |
complex arithmetic | Pass | Pass | Pass | Pass | Fail | - | - |
math linker error | Pass | Pass | Pass | Pass | Fail | Pass | - |
declare target static data | Pass | Pass | Pass | - | Pass | Fail | - |
static linking | Fail | Pass | Pass | Pass | Pass | - | - |
Async tasking | Fail | Fail | Pass | Fail | Fail | - | - |
multiple stream | Pass | Pass | Pass | Fail | Fail | - | - |
check_spo | Pass | Pass | Pass | Pass | Pass | - | - |
check_spo_batched | Pass | Pass | Pass | Pass | Pass | - | - |
miniqmc_sync_move | Pass | Pass | Pass | Pass | Pass | - | - |
- AOMP 0.7-5 or maybe ROCm has regression and caused linux kernel segfault. https://github.com/ROCm-Developer-Tools/aomp/issues/45
- Cray 9.1 inherits Clang 9 math function issues.
- GNU 9.2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92553
P pass
F fail
FL fail in linking
FR fail in run
FW fail with wrong results
- not tested yet