Skip to content

Commit

Permalink
Merge branch 'develop' into feature_uncertainty
Browse files Browse the repository at this point in the history
  • Loading branch information
elcorto committed Apr 16, 2024
2 parents fb37d1e + 62bbaeb commit 54408f1
Show file tree
Hide file tree
Showing 20 changed files with 1,249 additions and 109 deletions.
5 changes: 5 additions & 0 deletions docs/source/advanced_usage/descriptors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
Improved data conversion
========================

As a general remark please be reminded that if you have not used LAMMPS
for your first steps in MALA, and instead used the python-based descriptor
calculation methods, we highly advise switching to LAMMPS for advanced/more
involved examples (see :ref:`installation instructions for LAMMPS <lammpsinstallation>`).

Tuning descriptors
******************

Expand Down
5 changes: 5 additions & 0 deletions docs/source/advanced_usage/predictions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ Predictions at scale in principle work just like the predictions shown
in the basic guide. One has to set a few additional parameters to make
optimal use of the hardware at hand.

As a general remark please be reminded that if you have not used LAMMPS
for your first steps in MALA, and instead used the python-based descriptor
calculation methods, we highly advise switching to LAMMPS for advanced/more
involved examples (see :ref:`installation instructions for LAMMPS <lammpsinstallation>`).

MALA ML-DFT models can be used for predictions at system sizes and temperatures
larger resp. different from the ones they were trained on. If you want to make
a prediction at a larger length scale then the ML-DFT model was trained on,
Expand Down
2 changes: 1 addition & 1 deletion docs/source/basic_usage/more_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Data generation and conversion
MALA operates on volumetric data. Volumetric data is stored in binary files.
By default - and discussed here, in the introductory guide - this
means ``numpy`` files (``.npy`` files). Advanced data storing techniques
are :ref:`also available <openpmd data>`
are :ref:`also available <openpmd data>`.

Data generation
###############
Expand Down
17 changes: 13 additions & 4 deletions docs/source/citing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,19 @@ range, please cite the respective transferability studies:
@article{MALA_temperaturetransfer,
title={Machine learning the electronic structure of matter across temperatures},
author={Fiedler, Lenz and Modine, Normand A and Miller, Kyle D and Cangi, Attila},
journal={arXiv preprint arXiv:2306.06032},
year={2023}
title = {Machine learning the electronic structure of matter across temperatures},
author = {Fiedler, Lenz and Modine, Normand A. and Miller, Kyle D. and Cangi, Attila},
journal = {Phys. Rev. B},
volume = {108},
issue = {12},
pages = {125146},
numpages = {16},
year = {2023},
month = {Sep},
publisher = {American Physical Society},
doi = {10.1103/PhysRevB.108.125146},
url = {https://link.aps.org/doi/10.1103/PhysRevB.108.125146}
}
3 changes: 2 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,8 @@
'pqkmeans',
'dftpy',
'asap3',
'openpmd_io'
'openpmd_io',
'skspatial'
]

myst_heading_anchors = 3
Expand Down
7 changes: 4 additions & 3 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,12 @@ MALA has been employed in various publications, showcasing its versatility and e
data calculated for hundreds of atoms, MALA can predict the electronic
structure of up to 100'000 atoms.

- [Machine learning the electronic structure of matter across temperatures](https://doi.org/10.48550/arXiv.2306.06032) (arXiv preprint)
- [Machine learning the electronic structure of matter across temperatures](https://doi.org/10.1103/PhysRevB.108.125146) (Phys. Rev. B)
by L. Fiedler, N. A. Modine, K. D. Miller, A. Cangi

- Currently in the preprint stage. Shown here is the temperature
tranferability of MALA models.
- This publication shows how MALA models can be employed across temperature
ranges. It is demonstrated how such models account for both ionic and
electronic temperature effects of materials.



Expand Down
2 changes: 2 additions & 0 deletions docs/source/install/installing_lammps.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _lammpsinstallation:

Installing LAMMPS
==================

Expand Down
21 changes: 13 additions & 8 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,30 @@ Installation
As a software package, MALA consists of three parts:

1. The actual Python package ``mala``, which this documentation accompanies
2. The `LAMMPS <https://www.lammps.org/>`_ code, which is used by MALA to
encode atomic structures on the real-space grid
3. The `Quantum ESPRESSO <https://www.quantum-espresso.org/>`_ (QE) code, which
2. The `Quantum ESPRESSO <https://www.quantum-espresso.org/>`_ (QE) code, which
is used by MALA to post-process the LDOS into total free energies (via the
so called "total energy module")
3. The `LAMMPS <https://www.lammps.org/>`_ code, which is used by MALA to
encode atomic structures on the real-space grid (optional, but highly
recommended!)

All three parts require separate installations. The most important one is
the first one, i.e., the Python library, and you can access a lot of MALA
functionalities by just installing the MALA Python library, especially when
working with precalculated input and output data (e.g. for model training).

For access to all feature, you will have to furthermore install the LAMMPS
and QE codes and associated Python bindings. The installation has been tested
on Linux (Ubuntu/CentOS), Windows and macOS. The individual installation steps
are given in:
For access to all feature, you will have to furthermore install the QE code.
The calculations performed by LAMMPS are also implemented in the python part
of MALA. For small test calculations and development tasks, you therefore do
not need LAMMPS. For realistic simulations the python implementation is not
efficient enough, and you have to use LAMMPS.

The installation has been tested on Linux (Ubuntu/CentOS), Windows and macOS.
The individual installation steps are given in:

.. toctree::
:maxdepth: 1

install/installing_mala
install/installing_lammps
install/installing_qe
install/installing_lammps
1 change: 1 addition & 0 deletions install/mala_cpu_base_environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ dependencies:
- pytorch-cpu
- mpmath
- tensorboard
- scikit-spatial
1 change: 1 addition & 0 deletions install/mala_cpu_environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ dependencies:
- requests-oauthlib=1.3.1
- rsa=4.9
- scipy=1.8.1
- scikit-spatial=6.8.1
- setuptools=59.8.0
- six=1.16.0
- sleef=3.5.1
Expand Down
24 changes: 21 additions & 3 deletions mala/common/parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def __init__(self,):
super(ParametersBase, self).__init__()
self._configuration = {"gpu": False, "horovod": False, "mpi": False,
"device": "cpu", "openpmd_configuration": {},
"openpmd_granularity": 1}
"openpmd_granularity": 1, "lammps": True}
pass

def show(self, indent=""):
Expand Down Expand Up @@ -71,6 +71,9 @@ def _update_openpmd_configuration(self, new_openpmd):
def _update_openpmd_granularity(self, new_granularity):
self._configuration["openpmd_granularity"] = new_granularity

def _update_lammps(self, new_lammps):
self._configuration["lammps"] = new_lammps

@staticmethod
def _member_to_json(member):
if isinstance(member, (int, float, type(None), str)):
Expand Down Expand Up @@ -1180,6 +1183,7 @@ def __init__(self):
# TODO: Maybe as a percentage? Feature dimensions can be quite
# different.
self.openpmd_granularity = 1
self.use_lammps = True

@property
def openpmd_granularity(self):
Expand Down Expand Up @@ -1307,6 +1311,7 @@ def use_mpi(self):
@use_mpi.setter
def use_mpi(self, value):
set_mpi_status(value)

# Invalidate, will be updated in setter.
self.device = None
self._use_mpi = value
Expand All @@ -1331,15 +1336,28 @@ def openpmd_configuration(self):
@openpmd_configuration.setter
def openpmd_configuration(self, value):
self._openpmd_configuration = value

# Invalidate, will be updated in setter.
self.network._update_openpmd_configuration(self.openpmd_configuration)
self.descriptors._update_openpmd_configuration(self.openpmd_configuration)
self.targets._update_openpmd_configuration(self.openpmd_configuration)
self.data._update_openpmd_configuration(self.openpmd_configuration)
self.running._update_openpmd_configuration(self.openpmd_configuration)
self.hyperparameters._update_openpmd_configuration(self.openpmd_configuration)

@property
def use_lammps(self):
"""Control whether or not to use LAMMPS for descriptor calculation."""
return self._use_lammps

@use_lammps.setter
def use_lammps(self, value):
self._use_lammps = value
self.network._update_lammps(self.use_lammps)
self.descriptors._update_lammps(self.use_lammps)
self.targets._update_lammps(self.use_lammps)
self.data._update_lammps(self.use_lammps)
self.running._update_lammps(self.use_lammps)
self.hyperparameters._update_lammps(self.use_lammps)

def show(self):
"""Print name and values of all attributes of this object."""
printout("--- " + self.__doc__.split("\n")[1] + " ---",
Expand Down
110 changes: 96 additions & 14 deletions mala/descriptors/atomic_density.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@
except ModuleNotFoundError:
pass
import numpy as np
from scipy.spatial import distance

from mala.descriptors.lammps_utils import set_cmdlinevars, extract_compute_np
from mala.common.parallelizer import printout
from mala.descriptors.lammps_utils import extract_compute_np
from mala.descriptors.descriptor import Descriptor

# Empirical value for the Gaussian descriptor width, determined for an
Expand Down Expand Up @@ -117,28 +119,37 @@ def get_optimal_sigma(voxel):
return (np.max(voxel) / reference_grid_spacing_aluminium) * \
optimal_sigma_aluminium

def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
def _calculate(self, outdir, **kwargs):
if self.parameters._configuration["lammps"]:
try:
from lammps import lammps
except ModuleNotFoundError:
printout("No LAMMPS found for descriptor calculation, "
"falling back to python.")
return self.__calculate_python(**kwargs)

return self.__calculate_lammps(outdir, **kwargs)
else:
return self.__calculate_python(**kwargs)

def __calculate_lammps(self, outdir, **kwargs):
"""Perform actual Gaussian descriptor calculation."""
use_fp64 = kwargs.get("use_fp64", False)
return_directly = kwargs.get("return_directly", False)

lammps_format = "lammps-data"
ase_out_path = os.path.join(outdir, "lammps_input.tmp")
ase.io.write(ase_out_path, atoms, format=lammps_format)
ase.io.write(ase_out_path, self.atoms, format=lammps_format)

nx = grid_dimensions[0]
ny = grid_dimensions[1]
nz = grid_dimensions[2]
nx = self.grid_dimensions[0]
ny = self.grid_dimensions[1]
nz = self.grid_dimensions[2]

# Check if we have to determine the optimal sigma value.
if self.parameters.atomic_density_sigma is None:
self.grid_dimensions = [nx, ny, nz]
voxel = atoms.cell.copy()
voxel[0] = voxel[0] / (self.grid_dimensions[0])
voxel[1] = voxel[1] / (self.grid_dimensions[1])
voxel[2] = voxel[2] / (self.grid_dimensions[2])
self.parameters.atomic_density_sigma = self.\
get_optimal_sigma(voxel)
get_optimal_sigma(self.voxel)

# Create LAMMPS instance.
lammps_dict = {}
Expand Down Expand Up @@ -197,9 +208,9 @@ def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
# and thus have to properly reorder it.
# We have to switch from x fastest to z fastest reordering.
gaussian_descriptors_np = \
gaussian_descriptors_np.reshape((grid_dimensions[2],
grid_dimensions[1],
grid_dimensions[0],
gaussian_descriptors_np.reshape((self.grid_dimensions[2],
self.grid_dimensions[1],
self.grid_dimensions[0],
7))
gaussian_descriptors_np = \
gaussian_descriptors_np.transpose([2, 1, 0, 3])
Expand All @@ -212,3 +223,74 @@ def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
return gaussian_descriptors_np[:, :, :, 6:], \
nx*ny*nz

def __calculate_python(self, **kwargs):
"""
Perform Gaussian descriptor calculation using python.
The code used to this end was adapted from the LAMMPS implementation.
It serves as a fallback option whereever LAMMPS is not available.
This may be useful, e.g., to students or people getting started with
MALA who just want to look around. It is not intended for production
calculations.
Compared to the LAMMPS implementation, this implementation has quite a
few limitations. Namely
- It is roughly an order of magnitude slower for small systems
and doesn't scale too great
- It only works for ONE chemical element
- It has no MPI or GPU support
"""
printout("Using python for descriptor calculation. "
"The resulting calculation will be slow for "
"large systems.")

gaussian_descriptors_np = np.zeros((self.grid_dimensions[0],
self.grid_dimensions[1],
self.grid_dimensions[2], 4),
dtype=np.float64)

# Construct the hyperparameters to calculate the Gaussians.
# This follows the implementation in the LAMMPS code.
if self.parameters.atomic_density_sigma is None:
self.parameters.atomic_density_sigma = self.\
get_optimal_sigma(self.voxel)
cutoff_squared = self.parameters.atomic_density_cutoff * \
self.parameters.atomic_density_cutoff
prefactor = 1.0 / (np.power(self.parameters.atomic_density_sigma *
np.sqrt(2*np.pi),3))
argumentfactor = 1.0 / (2.0 * self.parameters.atomic_density_sigma *
self.parameters.atomic_density_sigma)

# Create a list of all potentially relevant atoms.
all_atoms = self._setup_atom_list()

# I think this nested for-loop could probably be optimized if instead
# the density matrix is used on the entire grid. That would be VERY
# memory-intensive. Since the goal of such an optimization would be
# to use this implementation at potentially larger length-scales,
# one would have to investigate that this is OK memory-wise.
# I haven't optimized it yet for the smaller scales since there
# the performance was already good enough.
for i in range(0, self.grid_dimensions[0]):
for j in range(0, self.grid_dimensions[1]):
for k in range(0, self.grid_dimensions[2]):
# Compute the grid.
gaussian_descriptors_np[i, j, k, 0:3] = \
self._grid_to_coord([i, j, k])

# Compute the Gaussian descriptors.
dm = np.squeeze(distance.cdist(
[gaussian_descriptors_np[i, j, k, 0:3]],
all_atoms))
dm = dm*dm
dm_cutoff = dm[np.argwhere(dm < cutoff_squared)]
gaussian_descriptors_np[i, j, k, 3] += \
np.sum(prefactor*np.exp(-dm_cutoff*argumentfactor))

if self.parameters.descriptors_contain_xyz:
self.fingerprint_length = 4
return gaussian_descriptors_np, np.prod(self.grid_dimensions)
else:
self.fingerprint_length = 1
return gaussian_descriptors_np[:, :, :, 3:], \
np.prod(self.grid_dimensions)
Loading

0 comments on commit 54408f1

Please sign in to comment.