-
Notifications
You must be signed in to change notification settings - Fork 16
Building KHARMA
If you're just looking to play with KHARMA for small simulations, and specifically do not need to:
- Run a job using multiple nodes in a cluster with MPI
- Run on GPUs
- Run on a machine with a non-x86 (ARM/POWER/etc) architecture
- Run as efficiently as possible on new x86 CPUs
then consider starting with the KHARMA container, which avoids compiling the code at all. For example, to get running in three lines:
$ singularity pull docker://registry.gitlab.com/afd-illinois/kharma:dev
$ wget https://raw.githubusercontent.com/AFD-Illinois/kharma/stable/pars/orszag_tang.par
$ singularity run kharma_dev.sif /app/run.sh -i orszag_tang.par
The container/registry itself is OCI compliant, so it'll work with docker
, podman
, etc too (just remember to give the container access to the parameter file!). The version of KHARMA in /app
inside the container is difficult to modify and recompile -- however, the container can also be used to compile and run a version cloned from the git repository:
$ git clone https://github.com/AFD-Illinois/kharma.git
$ cd kharma
$ git submodule update --init --recursive
$ singularity shell /path/to/kharma_dev.sif
Singularity> PREFIX_PATH=/usr/lib64/mpich EXTRA_FLAGS="-DPARTHENON_DISABLE_HDF5_COMPRESSION=ON" ./make.sh clean
Singularity> ./run.sh -i pars/orszag_tang.par
First, be sure to check out all of KHARMA's dependencies by running
$ git submodule update --init --recursive
This will grab KHARMA's two main dependencies (as well as some incidental things):
-
The Parthenon AMR framework from LANL (accompanying documentation). Note KHARMA actually uses a fork of Parthenon, see here.
-
The Kokkos performance-portability library, originally from SNL. If they are not present here, many common questions and problems can be answered by the Kokkos wiki and tutorials. Parthenon includes a list of the Parthenon-specific wrappers for Kokkos functions in their developer guide.
The dependencies KHARMA needs from the system are ~exactly the same as Parthenon and Kokkos:
- A C++17 compliant compiler with OpenMP (tested on several of GCC >= 8, Intel >= 19, nvc++ (formerly PGI) >= 21.9, clang++ and derivatives >= 12)
- An MPI implementation
- Parallel HDF5 compiled against this MPI implementation
And optionally
- CUDA >= 11.5 and a CUDA-supported C++ compiler
OR
- The most recent Intel oneAPI release, and a compatible OpenCL compiler for Intel GPUs (SYCL/oneAPI support is experimental)
All of these should come either as distribution packages for local Linux systems, or as modules on HPC systems, with the exception of parallel HDF5. Luckily it is quite easy to compile manually, or can be compiled automatically by make.sh
if the parameter hdf5
is put on the command line. KHARMA can also be compiled without MPI, by specifying nompi
(meaning that in some sense, the only user-provided KHARMA dependencies are a recent C++ compiler and any system libraries).
KHARMA uses cmake
for building, and has a small set of bash
scripts to handle loading the correct modules and giving the correct arguments to cmake
on specific systems. Contributions with additional machine-specific code are welcome, see the examples in machines/
.
Generally, on systems with a parallel HDF5 module, one can then run the following to compile for CPU with OpenMP:
./make.sh clean
You may be able to use the following to compile for GPU with CUDA:
./make.sh clean cuda
cmake
will check default directories for each dependency, e.g. /usr/local
, and provide detailed error messages about which libraries it is missing. In many cases, you will only need to add the path to your parallel HDF5:
PREFIX_PATH=/absolute/path/to/phdf5 HOST_ARCH=CPUVER ./make.sh clean
If you need to specify multiple custom-installed dependencies (e.g. CUDA), you can set PREFIX_PATH="/path/to/one;/path/to/two". PREFIX_PATH does not support spaces in paths, because shell escapes are hard.
Several notes:
- After compiling once successfully, you do not need to specify
clean
any longer, unless you change the compiler or the location of dependencies. Invocations withoutclean
will recompile only the files you've changed in KHARMA. - Since many
conda
environments include a serial version of HDF5, having aconda
environment loaded can preventcmake
correctly finding the parallel version. Unload your conda environments before compiling code! - To avoid adding the prefix variables during every compile, create a file
machines/hostname.sh
in the style of other files in that directory, setting any necessary environment variables and/or loading any necessary modules. - If you run into issues when compiling, remember to check the "Known Incompatibilities" section of this page.
As mentioned above, there are two additional arguments to make.sh
specifying dependencies:
-
hdf5
will compile a version of HDF5 inline with building KHARMA, using the same compiler and options. This is an easy way to get a compatible and fast HDF5 implementation, at the cost of extra compile time. The HDF5 build may not work on all systems. -
nompi
will compile without MPI suppport
There are two additional useful arguments to make.sh
, for building KHARMA with debugging support:
-
debug
will enable theDEBUG
flag in the code, and more importantly enable bounds-checking in all Kokkos arrays. Useful for very weird undesired behavior and segfaults. Note, however, that most KHARMA checks, prints, and debugging output are actually enabled at runtime, under the<debug>
section of the input deck. -
trace
will print each part of a step tostderr
as it is being run (technically, anywhere with aFLAG()
statement in the code). This is useful for pinning down where segfaults are occurring, without manually bisecting the whole code with print statements.
The build script make.sh
tries to guess an architecture when compiling, defaulting to code which will be reasonably fast on modern machines. However, you can manually specify a host and/or device architecture. For example, when compiling for CUDA:
PREFIX_PATH=/absolute/path/to/phdf5 HOST_ARCH=CPUVER DEVICE_ARCH=GPUVER ./make.sh clean cuda
Where CPUVER
and GPUVER
are the strings used by Kokkos to denote a particular architecture & set of compile flags, e.g. "SKX" for Skylake-X, "HSW" for Haswell, or "AMDAVX" for Ryzen/EPYC processors, and VOLTA70, TURING75, or AMPERE80 for Nvidia GPUs. A list of a few common architecture strings is provided in make.sh
, and a full (usually) up-to-date list is kept in the Kokkos documentation. (Note make.sh
needs only the portion of the flag after Kokkos_ARCH_
).
If deploying KHARMA to a machine with GPUs, be careful that the MPI stack you use is CUDA-aware -- this allows direct communication from GPUs to the network without involving CPU and RAM, which is much faster. There are notes for particular systems on the machines page.
KHARMA (or, especially Kokkos and Parthenon) push C++ to its limits, out where some compiler issues or new & untested backend incompatibilities can be exposed. Generally, the newest versions of modules and compilers are the most likely to work. Here's an incomplete list of known bad combinations:
- If you attempt to compile KHARMA with a version of CUDA before 11.2,
nvcc
will crash during compilation with the error:Error: Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!"
(see the relevant Kokkos bug). This is a bug innvcc
's support ofconstexpr
in C++17, fixed in 11.2. This appears to be independent of which host compiler is used, but be aware that on some systems (e.g., Summit) the compiler choice affects which CUDA version is loaded. - In general, older compilers or software stacks will present a much more difficult path to compiling than newer code. This is particularly true of CentOS 7 and derivatives due to extremely old default versions of
gcc
and particularlylibstdc++
. If possible on such machines, load a newergcc
as a module, which might bring with it a more recent standard library as well (other compilers, such asicc
orclang
rely on the system version oflibstdc++
, and thus even new versions of these may have trouble compiling KHARMA on old operating systems). If compiling even withgcc
proves impossible, you may have to resort to using a container to provide the correct dependencies. - GCC version 7.3.0 exactly has a bug making it incapable of compiling a particular Parthenon function, fixed in 7.3.1 and 8+. It is for unfathomable reasons very widely deployed as the default compiler on various machines, but if any other stack is available it should be preferred.
- NVHPC toolkit versions 21.1 through 21.7 with the nvc/nvc++ compiler do not compile Parthenon well. nvc++ works again in v. 21.9+, which are now more commonly deployed, or at least available. Generally, the newest version of NVHPC is best.
- Intel oneAPI implementation of SYCL, any version before the most recent. SYCL is a moving target: Kokkos tends to support only the most recent release, and at best we nominally do the same.
Error looks like
kharma/external/parthenon/src/../../variant/include/mpark/variant.hpp(1613): error: parameter pack "Ts" was referenced but not expanded
This is because of a bug detailed here but fixed only for the Intel compiler. You can monkey-patch a similar change by making the following edit to external/variant/include/mpark/config.hpp
:
-#if __has_builtin(__type_pack_element) && !(defined(__ICC))
+//#if __has_builtin(__type_pack_element) && !(defined(__ICC))
+#if 0
#define MPARK_TYPE_PACK_ELEMENT
#endif
This is due to throwing an error, which SYCL disallows in device code. Removing the throw statement restores the compile, diff soon^TM.
KHARMA uses a lot of resources per process, and by default uses a lot of processes to compile (NPROC
in make.sh
or machine files, which defaults to the total number of threads present on the system). This is generally fine for workstations and single nodes, however on popular login nodes or community machines you might see the following (e.g. on Frontera):
...
icpc: error #10103: can't fork process: Resource temporarily unavailable
make[2]: *** [external/parthenon/src/CMakeFiles/parthenon.dir/driver/multistage.cpp.o] Error 1
...
...
This means that make
can't fork new compile processes, which of course ruins the compile. You can find a less popular node (e.g. with a development job), or turn down the NPROC
variable at the top of make.sh
, or wait until the node is not so in-demand.
When compiling a version of KHARMA that is also being run, the OS will (logically) not replace the binary file kharma.x
being used by the running program. The error is usually something like cp: cannot create regular file '../kharma.host': Text file busy
. To correct this, invoke make.sh
again when the run is finished or stopped (or manually run cp build/kharma/kharma.host .
).