-
Notifications
You must be signed in to change notification settings - Fork 35
KOKKOS Branch
To build with Kokkos, first checkout github.com/kokkos/kokkos.git . Let's assume you've placed the top level Kokkos directory in ${KOKKOS_ROOT}. Then, navigate to the miniqmc/build directory. The following cmake command will build for a Power8 based CPU with OpenMP threading, assuming the GCC compiler is used:
> cmake -DQMC_USE_KOKKOS=1 \
-DKOKKOS_PREFIX=${KOKKOS_ROOT} \
-DKOKKOS_ARCH="Power8" \
-DKOKKOS_ENABLE_OPENMP=true \
-DKOKKOS_ENABLE_EXPLICIT_INSTANTIATION=false \
-DCMAKE_CXX_FLAGS="-Drestrict=__restrict__ -D__forceinline=inline" ..
The CMAKE_CXX_FLAGS are to deal with the handling of the "restrict" and "__forceinline" keywords that appear in miniqmc. Analogous flags for different compilers can be found in their reference.
For CUDA, assuming a Power8 host and a P100 Nvidia Card, we use the following:
> cmake -DQMC_USE_KOKKOS=1 \
-DKOKKOS_PREFIX=${KOKKOS_ROOT} \
-DKOKKOS_ENABLE_CUDA=true \
-DKOKKOS_ENABLE_OPENMP=false \
-DKOKKOS_ARCH="Power8;Pascal60" \
-DKOKKOS_ENABLE_CUDA_UVM=true \
-DKOKKOS_ENABLE_CUDA_LAMBDA=true \
-DKOKKOS_ENABLE_EXPLICIT_INSTANTIATION=false \
-DCMAKE_CXX_COMPILER=${KOKKOS_ROOT}/bin/nvcc_wrapper \
-DCMAKE_CXX_FLAGS="-Drestrict=__restrict__ -D__forceinline=inline " ..
KOKKOS_ENABLE_CUDA=true and KOKKOS_ENABLE_CUDA_UVM=true must be set. Notice also that the compiler CMAKE_CXX_COMPILER is hijacked by the Kokkos nvcc wrapper.
It is recommended that the following run time variables be set:
- export OMP_PROC_BIND=spread
- export OMP_PLACES=threads
- export OMP_NUM_THREADS=[put available/desired number of threads here]
Moreover, for asynchronous multi-walker moves using the partition_master construct, nested threading must be enabled. This is done with the following:
- export OMP_NESTED=true
- export OMP_NUM_THREADS=[comma separated list]
See https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fNUM_005fTHREADS.html
In order to build, you need to download and checkout the development branch of Kokkos and change to the global_batched_kokkos branch of miniqmc. For example:
> mkdir kokkos
> cd kokkos
> git clone https://github.com/kokkos/kokkos.git .
> git checkout develop
> cd ..
> mkdir miniqmc
> cd miniqmc
> git clone https://github.com/QMCPACK/miniqmc.git .
> git checkout global_batched_kokkos
> cd build
For a CPU build, first identify the architecture of the CPU you will use. On the left is the architecture name for the CPU type on the right.
AMDAVX AMD CPU
ARMv80 ARMv8.0 Compatible CPU
ARMv81 ARMv8.1 Compatible CPU
ARMv8-ThunderX ARMv8 Cavium ThunderX CPU
BGQ IBM Blue Gene Q
Power7 IBM POWER7 and POWER7+ CPUs
Power8 IBM POWER8 CPUs
Power9 IBM POWER9 CPUs
WSM Intel Westmere CPUs
SNB Intel Sandy/Ivy Bridge CPUs
HSW Intel Haswell CPUs
BDW Intel Broadwell Xeon E-class CPUs
SKX Intel Sky Lake Xeon E-class HPC CPUs (AVX512)
KNC Intel Knights Corner Xeon Phi
KNL Intel Knights Landing Xeon Phi
Now plug this into a cmake command like so after setting KOKKOS_ROOT to the directory where you have KOKKOS.
cmake -DQMC_USE_KOKKOS=1 \
-DQMC_MIXED_PRECISION=1 \
-DKOKKOS_PREFIX=${KOKKOS_ROOT} \
-DKOKKOS_ENABLE_CUDA=false \
-DKOKKOS_ENABLE_OPENMP=true \
-DKOKKOS_ARCH="SKX" \
-DKOKKOS_ENABLE_EXPLICIT_INSTANTIATION=false \
-DCMAKE_CXX_FLAGS="-Drestrict=__restrict__ -D__forceinline=inline" \
-DCMAKE_CXX_COMPILER="icpc" \
..
Where SKX is replaced with the appropriate architecture from above. Now build with:
make miniqmc_sync_move_noref
The code can now be run as bin/miniqmc_sync_move_noref -g "2 1 1" -n 5 -r 0.99 -16.
For the GPU, things are similar, but the list of GPU architectures includes:
Kepler30 NVIDIA Kepler generation CC 3.0
Kepler32 NVIDIA Kepler generation CC 3.2
Kepler35 NVIDIA Kepler generation CC 3.5
Kepler37 NVIDIA Kepler generation CC 3.7
Maxwell50 NVIDIA Maxwell generation CC 5.0
Maxwell52 NVIDIA Maxwell generation CC 5.2
Maxwell53 NVIDIA Maxwell generation CC 5.3
Pascal60 NVIDIA Pascal generation CC 6.0
Pascal61 NVIDIA Pascal generation CC 6.1
Volta70 NVIDIA Volta generation CC 7.0
Volta72 NVIDIA Volta generation CC 7.2
On a P8+ P100 system you use:
cmake -DQMC_USE_KOKKOS=1 \
-DQMC_MIXED_PRECISION=1 \
-DKOKKOS_PREFIX=/ascldap/users/lshulen/new-sandbox/kokkos \
-DKOKKOS_ENABLE_CUDA=true \
-DKOKKOS_ENABLE_OPENMP=false \
-DKOKKOS_ARCH="Power8;Pascal60" \
-DKOKKOS_ENABLE_CUDA_UVM=false \
-DKOKKOS_ENABLE_CUDA_LAMBDA=true \
-DKOKKOS_ENABLE_DEBUG=true \
-DCMAKE_CXX_COMPILER=/ascldap/users/lshulen/sandbox/kokkos/bin/nvcc_wrapper \
-DCMAKE_CXX_FLAGS="-G0 -g -Drestrict=__restrict__ -D__forceinline=inline " ..
The code is run in the same way from there.