diff --git a/docs/source/advanced_usage/predictions.rst b/docs/source/advanced_usage/predictions.rst index c610c61b6..a16ece7bd 100644 --- a/docs/source/advanced_usage/predictions.rst +++ b/docs/source/advanced_usage/predictions.rst @@ -26,7 +26,7 @@ You can manually specify the inference grid if you wish via # ASE calculator calculator.mala_parameters.running.inference_data_grid = ... -Where you have to specify a list with three entries ``[x,y,z]``. As matter +Here you have to specify a list with three entries ``[x,y,z]``. As matter of principle, stretching simulation cells in either direction should be reflected by the grid. @@ -42,8 +42,8 @@ Likewise, you can adjust the inference temperature via .. _production_gpu: -Predictions on multiple GPUs -**************************** +Predictions on GPUs +******************* MALA predictions can be run entirely on a GPU. For the NN part of the workflow, this seems like a trivial statement, but the GPU acceleration extends to @@ -57,16 +57,16 @@ with prior to an ASE calculator calculation or usage of the ``Predictor`` class, all computationally heavy parts of the MALA inference, will be offloaded to the GPU. Please note that this requires LAMMPS to be installed with GPU, i.e., Kokkos -support. Multiple GPUs can be used during inference by further enabling +support. Multiple GPUs can be used during inference by first enabling parallelization via .. code-block:: python parameters.use_mpi = True - -Setting both ``use_mpi`` and ``use_gpu`` to ``True`` yields multi-GPU -inferences. +and then invoking the MALA instance through ``mpirun``, ``srun`` or whichever +MPI wrapper is used on your machine. Details on parallelization +are provided :ref:`below `. .. note:: @@ -75,42 +75,41 @@ inferences. Currently, there is no direct GPU acceleration for the total energy calculation. For smaller calculations, this is unproblematic, but it can become -a problem for systems of even moderate size. To alleviate this problem, MALA +an issue for systems of even moderate size. To alleviate this problem, MALA provides an optimized total energy calculation routine which utilizes a Gaussian representation of atomic positions. In this algorithm, most of the computational overhead of the total energy calculation is offloaded to the -computation of this Gaussian representation, which is realized via LAMMPS and -can therefore be accelerated as outlined above. Simply activate this option -via +computation of this Gaussian representation. This calculation is realized via +LAMMPS and can therefore be GPU accelerated (parallelized) in the same fashion +as the bispectrum descriptor calculation. Simply activate this option via .. code-block:: python parameters.descriptors.use_atomic_density_energy_formula = True The Gaussian representation algorithm is describe in -the publication `Predicting electronic structures at any length scale with machine learning `_ +the publication `Predicting electronic structures at any length scale with machine learning `_. + +.. _production_parallel: -Parallel predictions on CPUs -**************************** +Parallel predictions +******************** -Since GPU usage is currently limited to one GPU at a time, predictions -for ten- to hundreds of thousands of atoms rely on the usage of a large number -of CPUs. Just like with GPU acceleration, nothing about the general inference -workflow has to be changed. Simply enable MPI usage in MALA +MALA predictions may be run on a large number of processing units, either +CPU or GPU. To do so, simply enable MPI usage in MALA .. code-block:: python parameters.use_mpi = True -Please be aware that GPU and MPI usage are mutually exclusive for inference -at the moment. Once MPI is activated, you can start the MPI aware Python script -with a large number of CPUs to simulate materials at large length scales. +Once MPI is activated, you can start the MPI aware Python script using +``mpirun``, ``srun`` or whichever MPI wrapper is used on your machine. -By default, MALA can only operate with a number of CPUs by which the +By default, MALA can only operate with a number of processes by which the z-dimension of the inference grid can be evenly divided, since the Quantum ESPRESSO backend of MALA by default only divides data along the z-dimension. If you, e.g., have an inference grid of ``[200,200,200]`` points, you can use -a maximum of 200 CPUs. Using, e.g., 224 CPUs will lead to an error. +a maximum of 200 ranks. Using, e.g., 224 CPUs will lead to an error. Parallelization can further be made more efficient by also enabling splitting in the y-dimension. This is done by setting the parameter @@ -122,8 +121,9 @@ in the y-dimension. This is done by setting the parameter to an integer value ``ysplit`` (default: 0). If ``ysplit`` is not zero, each z-plane will be divided ``ysplit`` times for the parallelization. If you, e.g., have an inference grid of ``[200,200,200]``, you could use -400 CPUs and ``ysplit`` of 2. Then, the grid will be sliced into 200 z-planes, -and each z-plane will be sliced twice, allowing even faster inference. +400 processes and ``ysplit`` of 2. Then, the grid will be sliced into 200 +z-planes, and each z-plane will be sliced twice, allowing even faster +inference. Visualizing observables ************************