Merge branch 'develop' into feature_uncertainty

mala-project · Apr 16, 2024 · 54408f1 · 54408f1
2 parents fb37d1e + 62bbaeb
commit 54408f1
Show file tree

Hide file tree

Showing 20 changed files with 1,249 additions and 109 deletions.
diff --git a/docs/source/advanced_usage/descriptors.rst b/docs/source/advanced_usage/descriptors.rst
@@ -3,6 +3,11 @@
 Improved data conversion
 ========================
 
+As a general remark please be reminded that if you have not used LAMMPS
+for your first steps in MALA, and instead used the python-based descriptor
+calculation methods, we highly advise switching to LAMMPS for advanced/more
+involved examples (see  :ref:`installation instructions for LAMMPS <lammpsinstallation>`).
+
 Tuning descriptors
 ******************
 

diff --git a/docs/source/advanced_usage/predictions.rst b/docs/source/advanced_usage/predictions.rst
@@ -8,6 +8,11 @@ Predictions at scale in principle work just like the predictions shown
 in the basic guide. One has to set a few additional parameters to make
 optimal use of the hardware at hand.
 
+As a general remark please be reminded that if you have not used LAMMPS
+for your first steps in MALA, and instead used the python-based descriptor
+calculation methods, we highly advise switching to LAMMPS for advanced/more
+involved examples (see  :ref:`installation instructions for LAMMPS <lammpsinstallation>`).
+
 MALA ML-DFT models can be used for predictions at system sizes and temperatures
 larger resp. different from the ones they were trained on. If you want to make
 a prediction at a larger length scale then the ML-DFT model was trained on,

diff --git a/docs/source/basic_usage/more_data.rst b/docs/source/basic_usage/more_data.rst
@@ -4,7 +4,7 @@ Data generation and conversion
 MALA operates on volumetric data. Volumetric data is stored in binary files.
 By default - and discussed here, in the introductory guide - this
 means ``numpy`` files (``.npy`` files). Advanced data storing techniques
-are :ref:`also available <openpmd data>`
+are :ref:`also available <openpmd data>`.
 
 Data generation
 ###############

diff --git a/docs/source/citing.rst b/docs/source/citing.rst
@@ -67,10 +67,19 @@ range, please cite the respective transferability studies:
 
 
             @article{MALA_temperaturetransfer,
-                title={Machine learning the electronic structure of matter across temperatures},
-                author={Fiedler, Lenz and Modine, Normand A and Miller, Kyle D and Cangi, Attila},
-                journal={arXiv preprint arXiv:2306.06032},
-                year={2023}
+                title = {Machine learning the electronic structure of matter across temperatures},
+                author = {Fiedler, Lenz and Modine, Normand A. and Miller, Kyle D. and Cangi, Attila},
+                journal = {Phys. Rev. B},
+                volume = {108},
+                issue = {12},
+                pages = {125146},
+                numpages = {16},
+                year = {2023},
+                month = {Sep},
+                publisher = {American Physical Society},
+                doi = {10.1103/PhysRevB.108.125146},
+                url = {https://link.aps.org/doi/10.1103/PhysRevB.108.125146}
             }
 
 
+
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -72,7 +72,8 @@
     'pqkmeans',
     'dftpy',
     'asap3',
-    'openpmd_io'
+    'openpmd_io',
+    'skspatial'
 ]
 
 myst_heading_anchors = 3

diff --git a/docs/source/index.md b/docs/source/index.md
@@ -93,11 +93,12 @@ MALA has been employed in various publications, showcasing its versatility and e
     data calculated for hundreds of atoms, MALA can predict the electronic
     structure of up to 100'000 atoms.
 
-- [Machine learning the electronic structure of matter across temperatures](https://doi.org/10.48550/arXiv.2306.06032) (arXiv preprint)
+- [Machine learning the electronic structure of matter across temperatures](https://doi.org/10.1103/PhysRevB.108.125146) (Phys. Rev. B)
   by L. Fiedler, N. A. Modine, K. D. Miller, A. Cangi
 
-  - Currently in the preprint stage. Shown here is the temperature 
-    tranferability of MALA models.
+  - This publication shows how MALA models can be employed across temperature 
+    ranges. It is demonstrated how such models account for both ionic and
+    electronic temperature effects of materials.
 
 
 

diff --git a/docs/source/install/installing_lammps.rst b/docs/source/install/installing_lammps.rst
@@ -1,3 +1,5 @@
+.. _lammpsinstallation:
+
 Installing LAMMPS
 ==================
 

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -4,25 +4,30 @@ Installation
 As a software package, MALA consists of three parts:
 
 1. The actual Python package ``mala``, which this documentation accompanies
-2. The `LAMMPS <https://www.lammps.org/>`_ code, which is used by MALA to
-   encode atomic structures on the real-space grid
-3. The `Quantum ESPRESSO <https://www.quantum-espresso.org/>`_ (QE) code, which
+2. The `Quantum ESPRESSO <https://www.quantum-espresso.org/>`_ (QE) code, which
    is used by MALA to post-process the LDOS into total free energies (via the
    so called "total energy module")
+3. The `LAMMPS <https://www.lammps.org/>`_ code, which is used by MALA to
+   encode atomic structures on the real-space grid (optional, but highly
+   recommended!)
 
 All three parts require separate installations. The most important one is
 the first one, i.e., the Python library, and you can access a lot of MALA
 functionalities by just installing the MALA Python library, especially when
 working with precalculated input and output data (e.g. for model training).
 
-For access to all feature, you will have to furthermore install the LAMMPS
-and QE codes and associated Python bindings. The installation has been tested
-on Linux (Ubuntu/CentOS), Windows and macOS. The individual installation steps
-are given in:
+For access to all feature, you will have to furthermore install the QE code.
+The calculations performed by LAMMPS are also implemented in the python part
+of MALA. For small test calculations and development tasks, you therefore do
+not need LAMMPS. For realistic simulations the python implementation is not
+efficient enough, and you have to use LAMMPS.
+
+The installation has been tested on Linux (Ubuntu/CentOS), Windows and macOS.
+The individual installation steps are given in:
 
 .. toctree::
    :maxdepth: 1
 
    install/installing_mala
-   install/installing_lammps
    install/installing_qe
+   install/installing_lammps
diff --git a/install/mala_cpu_base_environment.yml b/install/mala_cpu_base_environment.yml
@@ -13,3 +13,4 @@ dependencies:
   - pytorch-cpu
   - mpmath
   - tensorboard
+  - scikit-spatial
diff --git a/install/mala_cpu_environment.yml b/install/mala_cpu_environment.yml
@@ -127,6 +127,7 @@ dependencies:
   - requests-oauthlib=1.3.1
   - rsa=4.9
   - scipy=1.8.1
+  - scikit-spatial=6.8.1
   - setuptools=59.8.0
   - six=1.16.0
   - sleef=3.5.1

diff --git a/mala/common/parameters.py b/mala/common/parameters.py
@@ -30,7 +30,7 @@ def __init__(self,):
         super(ParametersBase, self).__init__()
         self._configuration = {"gpu": False, "horovod": False, "mpi": False,
                                "device": "cpu", "openpmd_configuration": {},
-                               "openpmd_granularity": 1}
+                               "openpmd_granularity": 1, "lammps": True}
         pass
 
     def show(self, indent=""):
@@ -71,6 +71,9 @@ def _update_openpmd_configuration(self, new_openpmd):
     def _update_openpmd_granularity(self, new_granularity):
         self._configuration["openpmd_granularity"] = new_granularity
 
+    def _update_lammps(self, new_lammps):
+        self._configuration["lammps"] = new_lammps
+
     @staticmethod
     def _member_to_json(member):
         if isinstance(member, (int, float, type(None), str)):
@@ -1180,6 +1183,7 @@ def __init__(self):
         # TODO: Maybe as a percentage? Feature dimensions can be quite
         # different.
         self.openpmd_granularity = 1
+        self.use_lammps = True
 
     @property
     def openpmd_granularity(self):
@@ -1307,6 +1311,7 @@ def use_mpi(self):
     @use_mpi.setter
     def use_mpi(self, value):
         set_mpi_status(value)
+
         # Invalidate, will be updated in setter.
         self.device = None
         self._use_mpi = value
@@ -1331,15 +1336,28 @@ def openpmd_configuration(self):
     @openpmd_configuration.setter
     def openpmd_configuration(self, value):
         self._openpmd_configuration = value
-
-        # Invalidate, will be updated in setter.
         self.network._update_openpmd_configuration(self.openpmd_configuration)
         self.descriptors._update_openpmd_configuration(self.openpmd_configuration)
         self.targets._update_openpmd_configuration(self.openpmd_configuration)
         self.data._update_openpmd_configuration(self.openpmd_configuration)
         self.running._update_openpmd_configuration(self.openpmd_configuration)
         self.hyperparameters._update_openpmd_configuration(self.openpmd_configuration)
 
+    @property
+    def use_lammps(self):
+        """Control whether or not to use LAMMPS for descriptor calculation."""
+        return self._use_lammps
+
+    @use_lammps.setter
+    def use_lammps(self, value):
+        self._use_lammps = value
+        self.network._update_lammps(self.use_lammps)
+        self.descriptors._update_lammps(self.use_lammps)
+        self.targets._update_lammps(self.use_lammps)
+        self.data._update_lammps(self.use_lammps)
+        self.running._update_lammps(self.use_lammps)
+        self.hyperparameters._update_lammps(self.use_lammps)
+
     def show(self):
         """Print name and values of all attributes of this object."""
         printout("--- " + self.__doc__.split("\n")[1] + " ---",

diff --git a/mala/descriptors/atomic_density.py b/mala/descriptors/atomic_density.py
@@ -14,8 +14,10 @@
 except ModuleNotFoundError:
     pass
 import numpy as np
+from scipy.spatial import distance
 
-from mala.descriptors.lammps_utils import set_cmdlinevars, extract_compute_np
+from mala.common.parallelizer import printout
+from mala.descriptors.lammps_utils import extract_compute_np
 from mala.descriptors.descriptor import Descriptor
 
 # Empirical value for the Gaussian descriptor width, determined for an
@@ -117,28 +119,37 @@ def get_optimal_sigma(voxel):
         return (np.max(voxel) / reference_grid_spacing_aluminium) * \
                optimal_sigma_aluminium
 
-    def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
+    def _calculate(self, outdir, **kwargs):
+        if self.parameters._configuration["lammps"]:
+            try:
+                from lammps import lammps
+            except ModuleNotFoundError:
+                printout("No LAMMPS found for descriptor calculation, "
+                         "falling back to python.")
+                return self.__calculate_python(**kwargs)
+
+            return self.__calculate_lammps(outdir, **kwargs)
+        else:
+            return self.__calculate_python(**kwargs)
+
+    def __calculate_lammps(self, outdir, **kwargs):
         """Perform actual Gaussian descriptor calculation."""
         use_fp64 = kwargs.get("use_fp64", False)
         return_directly = kwargs.get("return_directly", False)
 
         lammps_format = "lammps-data"
         ase_out_path = os.path.join(outdir, "lammps_input.tmp")
-        ase.io.write(ase_out_path, atoms, format=lammps_format)
+        ase.io.write(ase_out_path, self.atoms, format=lammps_format)
 
-        nx = grid_dimensions[0]
-        ny = grid_dimensions[1]
-        nz = grid_dimensions[2]
+        nx = self.grid_dimensions[0]
+        ny = self.grid_dimensions[1]
+        nz = self.grid_dimensions[2]
 
         # Check if we have to determine the optimal sigma value.
         if self.parameters.atomic_density_sigma is None:
             self.grid_dimensions = [nx, ny, nz]
-            voxel = atoms.cell.copy()
-            voxel[0] = voxel[0] / (self.grid_dimensions[0])
-            voxel[1] = voxel[1] / (self.grid_dimensions[1])
-            voxel[2] = voxel[2] / (self.grid_dimensions[2])
             self.parameters.atomic_density_sigma = self.\
-                get_optimal_sigma(voxel)
+                get_optimal_sigma(self.voxel)
 
         # Create LAMMPS instance.
         lammps_dict = {}
@@ -197,9 +208,9 @@ def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
                 # and thus have to properly reorder it.
                 # We have to switch from x fastest to z fastest reordering.
                 gaussian_descriptors_np = \
-                    gaussian_descriptors_np.reshape((grid_dimensions[2],
-                                                     grid_dimensions[1],
-                                                     grid_dimensions[0],
+                    gaussian_descriptors_np.reshape((self.grid_dimensions[2],
+                                                     self.grid_dimensions[1],
+                                                     self.grid_dimensions[0],
                                                      7))
                 gaussian_descriptors_np = \
                     gaussian_descriptors_np.transpose([2, 1, 0, 3])
@@ -212,3 +223,74 @@ def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
                     return gaussian_descriptors_np[:, :, :, 6:], \
                            nx*ny*nz
 
+    def __calculate_python(self, **kwargs):
+        """
+        Perform Gaussian descriptor calculation using python.
+
+        The code used to this end was adapted from the LAMMPS implementation.
+        It serves as a fallback option whereever LAMMPS is not available.
+        This may be useful, e.g., to students or people getting started with
+        MALA who just want to look around. It is not intended for production
+        calculations.
+        Compared to the LAMMPS implementation, this implementation has quite a
+        few limitations. Namely
+
+            - It is roughly an order of magnitude slower for small systems
+              and doesn't scale too great
+            - It only works for ONE chemical element
+            - It has no MPI or GPU support
+        """
+        printout("Using python for descriptor calculation. "
+                 "The resulting calculation will be slow for "
+                 "large systems.")
+
+        gaussian_descriptors_np = np.zeros((self.grid_dimensions[0],
+                                            self.grid_dimensions[1],
+                                            self.grid_dimensions[2], 4),
+                                           dtype=np.float64)
+
+        # Construct the hyperparameters to calculate the Gaussians.
+        # This follows the implementation in the LAMMPS code.
+        if self.parameters.atomic_density_sigma is None:
+            self.parameters.atomic_density_sigma = self.\
+                get_optimal_sigma(self.voxel)
+        cutoff_squared = self.parameters.atomic_density_cutoff * \
+            self.parameters.atomic_density_cutoff
+        prefactor = 1.0 / (np.power(self.parameters.atomic_density_sigma *
+                                    np.sqrt(2*np.pi),3))
+        argumentfactor = 1.0 / (2.0 * self.parameters.atomic_density_sigma *
+                                self.parameters.atomic_density_sigma)
+
+        # Create a list of all potentially relevant atoms.
+        all_atoms = self._setup_atom_list()
+
+        # I think this nested for-loop could probably be optimized if instead
+        # the density matrix is used on the entire grid. That would be VERY
+        # memory-intensive. Since the goal of such an optimization would be
+        # to use this implementation at potentially larger length-scales,
+        # one would have to investigate that this is OK memory-wise.
+        # I haven't optimized it yet for the smaller scales since there
+        # the performance was already good enough.
+        for i in range(0, self.grid_dimensions[0]):
+            for j in range(0, self.grid_dimensions[1]):
+                for k in range(0, self.grid_dimensions[2]):
+                    # Compute the grid.
+                    gaussian_descriptors_np[i, j, k, 0:3] = \
+                        self._grid_to_coord([i, j, k])
+
+                    # Compute the Gaussian descriptors.
+                    dm = np.squeeze(distance.cdist(
+                        [gaussian_descriptors_np[i, j, k, 0:3]],
+                        all_atoms))
+                    dm = dm*dm
+                    dm_cutoff = dm[np.argwhere(dm < cutoff_squared)]
+                    gaussian_descriptors_np[i, j, k, 3] += \
+                        np.sum(prefactor*np.exp(-dm_cutoff*argumentfactor))
+
+        if self.parameters.descriptors_contain_xyz:
+            self.fingerprint_length = 4
+            return gaussian_descriptors_np, np.prod(self.grid_dimensions)
+        else:
+            self.fingerprint_length = 1
+            return gaussian_descriptors_np[:, :, :, 3:], \
+                   np.prod(self.grid_dimensions)