diff --git a/docs/source/advanced_usage/predictions.rst b/docs/source/advanced_usage/predictions.rst
index 20e82494b..a16ece7bd 100644
--- a/docs/source/advanced_usage/predictions.rst
+++ b/docs/source/advanced_usage/predictions.rst
@@ -26,7 +26,7 @@ You can manually specify the inference grid if you wish via
             # ASE calculator
             calculator.mala_parameters.running.inference_data_grid = ...
 
-Where you have to specify a list with three entries ``[x,y,z]``. As matter
+Here you have to specify a list with three entries ``[x,y,z]``. As matter
 of principle, stretching simulation cells in either direction should be
 reflected by the grid.
 
@@ -42,7 +42,7 @@ Likewise, you can adjust the inference temperature via
 
 .. _production_gpu:
 
-Predictions on GPU
+Predictions on GPUs
 *******************
 
 MALA predictions can be run entirely on a GPU. For the NN part of the workflow,
@@ -56,37 +56,60 @@ with
 
 prior to an ASE calculator calculation or usage of the ``Predictor`` class,
 all computationally heavy parts of the MALA inference, will be offloaded
-to the GPU.
+to the GPU. Please note that this requires LAMMPS to be installed with GPU, i.e., Kokkos
+support. Multiple GPUs can be used during inference by first enabling
+parallelization via
 
-Please note that this requires LAMMPS to be installed with GPU, i.e., Kokkos
-support. A current limitation of this implementation is that only a *single*
-GPU can be used for inference. This puts an upper limit on the number of atoms
-which can be simulated, depending on the hardware you have access to.
-Usual numbers observed by MALA team put this limit at a few thousand atoms, for
-which the electronic structure can be predicted in 1-2 minutes. Currently,
-multi-GPU inference is being implemented.
+      .. code-block:: python
+
+            parameters.use_mpi = True
+
+and then invoking the MALA instance through ``mpirun``, ``srun`` or whichever
+MPI wrapper is used on your machine. Details on parallelization
+are provided :ref:`below <production_parallel>`.
+
+.. note::
+
+    To use GPU acceleration for total energy calculation, an additional
+    setting has to be used.
+
+Currently, there is no direct GPU acceleration for the total energy
+calculation. For smaller calculations, this is unproblematic, but it can become
+an issue for systems of even moderate size. To alleviate this problem, MALA
+provides an optimized total energy calculation routine which utilizes a
+Gaussian representation of atomic positions. In this algorithm, most of the
+computational overhead of the total energy calculation is offloaded to the
+computation of this Gaussian representation. This calculation is realized via
+LAMMPS and can therefore be GPU accelerated (parallelized) in the same fashion
+as the bispectrum descriptor calculation. Simply activate this option via
+
+    .. code-block:: python
+
+        parameters.descriptors.use_atomic_density_energy_formula = True
+
+The Gaussian representation algorithm is describe in
+the publication `Predicting electronic structures at any length scale with machine learning <doi.org/10.1038/s41524-023-01070-z>`_.
+
+.. _production_parallel:
 
-Parallel predictions on CPUs
-****************************
+Parallel predictions
+********************
 
-Since GPU usage is currently limited to one GPU at a time, predictions
-for ten- to hundreds of thousands of atoms rely on the usage of a large number
-of CPUs. Just like with GPU acceleration, nothing about the general inference
-workflow has to be changed. Simply enable MPI usage in MALA
+MALA predictions may be run on a large number of processing units, either
+CPU or GPU. To do so, simply enable MPI usage in MALA
 
       .. code-block:: python
 
             parameters.use_mpi = True
 
-Please be aware that GPU and MPI usage are mutually exclusive for inference
-at the moment. Once MPI is activated, you can start the MPI aware Python script
-with a large number of CPUs to simulate materials at large length scales.
+Once MPI is activated, you can start the MPI aware Python script using
+``mpirun``, ``srun`` or whichever MPI wrapper is used on your machine.
 
-By default, MALA can only operate with a number of CPUs by which the
+By default, MALA can only operate with a number of processes by which the
 z-dimension of the inference grid can be evenly divided, since the Quantum
 ESPRESSO backend of MALA by default only divides data along the z-dimension.
 If you, e.g., have an inference grid of ``[200,200,200]`` points, you can use
-a maximum of 200 CPUs. Using, e.g., 224 CPUs will lead to an error.
+a maximum of 200 ranks. Using, e.g., 224 CPUs will lead to an error.
 
 Parallelization can further be made more efficient by also enabling splitting
 in the y-dimension. This is done by setting the parameter
@@ -98,8 +121,9 @@ in the y-dimension. This is done by setting the parameter
 to an integer value ``ysplit`` (default: 0). If ``ysplit`` is not zero,
 each z-plane will be divided ``ysplit`` times for the parallelization.
 If you, e.g., have an inference grid of ``[200,200,200]``, you could use
-400 CPUs and ``ysplit`` of 2. Then, the grid will be sliced into 200 z-planes,
-and each z-plane will be sliced twice, allowing even faster inference.
+400 processes and ``ysplit`` of 2. Then, the grid will be sliced into 200
+z-planes, and each z-plane will be sliced twice, allowing even faster
+inference.
 
 Visualizing observables
 ************************
diff --git a/mala/common/parameters.py b/mala/common/parameters.py
index d91783583..e3f18bf55 100644
--- a/mala/common/parameters.py
+++ b/mala/common/parameters.py
@@ -321,6 +321,11 @@ class ParametersDescriptors(ParametersBase):
 
     atomic_density_sigma : float
         Sigma used for the calculation of the Gaussian descriptors.
+
+    use_atomic_density_energy_formula : bool
+        If True, Gaussian descriptors will be calculated for the
+        calculation of the Ewald sum as part of the total energy module.
+        Default is False.
     """
 
     def __init__(self):