Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descriptor calculation in python #510

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
3e9e90c
Started working on descriptor calculation in python
RandomDefaultUser Feb 23, 2024
45e061e
Reproduced LAMMPS grid (except for the bounding boxes?)
RandomDefaultUser Feb 23, 2024
5c13b65
Gaussian descriptors almost working
RandomDefaultUser Feb 23, 2024
6638994
Working on neighborlist
RandomDefaultUser Feb 26, 2024
48a39b8
Gaussian descriptors working - albeit terribly slow
RandomDefaultUser Feb 26, 2024
7a0dc61
Trying to do sort of a global neighborhood list
RandomDefaultUser Feb 26, 2024
8aaf480
Efficient implementation of Gaussian descriptors
RandomDefaultUser Feb 28, 2024
6be732a
Made grid dimensions, atoms and voxel Descriptor class properties
RandomDefaultUser Feb 28, 2024
137071e
Made interface consistent
RandomDefaultUser Feb 28, 2024
e32d13f
Further optimization
RandomDefaultUser Feb 28, 2024
41c8818
Further optimization
RandomDefaultUser Feb 28, 2024
4116480
I think I optimized something
RandomDefaultUser Feb 28, 2024
67ff378
Retook one optimization
RandomDefaultUser Feb 28, 2024
4161c06
Bugfix in optimized implementation
RandomDefaultUser Feb 29, 2024
7f2a623
Tried to reduce the list of all atoms further
RandomDefaultUser Feb 29, 2024
358013f
Small bugfix
RandomDefaultUser Feb 29, 2024
3b19ef8
Cleaned up the code and committed to skspatial
RandomDefaultUser Mar 12, 2024
749dfb9
Started with bispectrum descriptors
RandomDefaultUser Mar 13, 2024
22e544e
Implemented Ui
RandomDefaultUser Mar 19, 2024
ac5d3cc
Implemented zi
RandomDefaultUser Mar 20, 2024
edcafe1
Got bi
RandomDefaultUser Mar 20, 2024
76f64b6
Calculation finished, just probably very very slow
RandomDefaultUser Mar 20, 2024
c30f3dc
Some fun bispectrum debugging
RandomDefaultUser Mar 20, 2024
21bad16
compute ui working
RandomDefaultUser Mar 21, 2024
de8a6bd
Continuing to bugfix
RandomDefaultUser Mar 22, 2024
f7e341e
Debugged some more
RandomDefaultUser Mar 23, 2024
3017f05
zi is working now
RandomDefaultUser Mar 23, 2024
2cfbdac
Bispectrum descriptors working now, but very slow
RandomDefaultUser Mar 28, 2024
24fc6c1
Implemented some very obvious optimizations
RandomDefaultUser Mar 28, 2024
9fe82a0
Another small improvement
RandomDefaultUser Apr 2, 2024
b9c7d3e
Optimized ui; the code is horrible, but fast-ish
RandomDefaultUser Apr 2, 2024
f961330
Trying something with compute_zi, not yet finished
RandomDefaultUser Apr 2, 2024
53afa7a
This compute_zi function is not yet working - but it would roughly be…
RandomDefaultUser Apr 3, 2024
46921d5
The unvectorized version is working
RandomDefaultUser Apr 3, 2024
6b66e5c
Still debugging
RandomDefaultUser Apr 3, 2024
9fcdebd
Fastest version as of yet
RandomDefaultUser Apr 3, 2024
6ef0aef
(Cleaned) Fastest version as of yet
RandomDefaultUser Apr 3, 2024
533f78f
A bit more cleaning
RandomDefaultUser Apr 3, 2024
659d4d8
Started a full cleanup
RandomDefaultUser Apr 4, 2024
72cddba
More cleaning up
RandomDefaultUser Apr 4, 2024
9dfd88e
Almost finished with cleaning up
RandomDefaultUser Apr 4, 2024
2351fa0
Added warning for python based calculation
RandomDefaultUser Apr 4, 2024
9b269eb
Made python a fallback for the descriptor calculation.
RandomDefaultUser Apr 4, 2024
6694e6a
Fixed docstrings
RandomDefaultUser Apr 4, 2024
7a47fcb
Fixed docs
RandomDefaultUser Apr 4, 2024
a51aac4
Added a test and adapted some others
RandomDefaultUser Apr 4, 2024
dfe1e18
Small adjustments for the documentation
RandomDefaultUser Apr 4, 2024
a7d9fa2
Corrected Typo
RandomDefaultUser Apr 4, 2024
ccdd5fe
Added missing requirement
RandomDefaultUser Apr 4, 2024
d499248
Added missing requirement
RandomDefaultUser Apr 4, 2024
4cdf6bd
Trying a different scikit-spatial version
RandomDefaultUser Apr 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/source/advanced_usage/descriptors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
Improved data conversion
========================

As a general remark please be reminded that if you have not used LAMMPS
for your first steps in MALA, and instead used the python-based descriptor
calculation methods, we highly advise switching to LAMMPS for advanced/more
involved examples (see :ref:`installation instructions for LAMMPS <lammpsinstallation>`).

Tuning descriptors
******************

Expand Down
5 changes: 5 additions & 0 deletions docs/source/advanced_usage/predictions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ Predictions at scale in principle work just like the predictions shown
in the basic guide. One has to set a few additional parameters to make
optimal use of the hardware at hand.

As a general remark please be reminded that if you have not used LAMMPS
for your first steps in MALA, and instead used the python-based descriptor
calculation methods, we highly advise switching to LAMMPS for advanced/more
involved examples (see :ref:`installation instructions for LAMMPS <lammpsinstallation>`).

MALA ML-DFT models can be used for predictions at system sizes and temperatures
larger resp. different from the ones they were trained on. If you want to make
a prediction at a larger length scale then the ML-DFT model was trained on,
Expand Down
2 changes: 1 addition & 1 deletion docs/source/basic_usage/more_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Data generation and conversion
MALA operates on volumetric data. Volumetric data is stored in binary files.
By default - and discussed here, in the introductory guide - this
means ``numpy`` files (``.npy`` files). Advanced data storing techniques
are :ref:`also available <openpmd data>`
are :ref:`also available <openpmd data>`.

Data generation
###############
Expand Down
17 changes: 13 additions & 4 deletions docs/source/citing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,19 @@ range, please cite the respective transferability studies:


@article{MALA_temperaturetransfer,
title={Machine learning the electronic structure of matter across temperatures},
author={Fiedler, Lenz and Modine, Normand A and Miller, Kyle D and Cangi, Attila},
journal={arXiv preprint arXiv:2306.06032},
year={2023}
title = {Machine learning the electronic structure of matter across temperatures},
author = {Fiedler, Lenz and Modine, Normand A. and Miller, Kyle D. and Cangi, Attila},
journal = {Phys. Rev. B},
volume = {108},
issue = {12},
pages = {125146},
numpages = {16},
year = {2023},
month = {Sep},
publisher = {American Physical Society},
doi = {10.1103/PhysRevB.108.125146},
url = {https://link.aps.org/doi/10.1103/PhysRevB.108.125146}
}



3 changes: 2 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,8 @@
'pqkmeans',
'dftpy',
'asap3',
'openpmd_io'
'openpmd_io',
'skspatial'
]

myst_heading_anchors = 3
Expand Down
7 changes: 4 additions & 3 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,12 @@ MALA has been employed in various publications, showcasing its versatility and e
data calculated for hundreds of atoms, MALA can predict the electronic
structure of up to 100'000 atoms.

- [Machine learning the electronic structure of matter across temperatures](https://doi.org/10.48550/arXiv.2306.06032) (arXiv preprint)
- [Machine learning the electronic structure of matter across temperatures](https://doi.org/10.1103/PhysRevB.108.125146) (Phys. Rev. B)
by L. Fiedler, N. A. Modine, K. D. Miller, A. Cangi

- Currently in the preprint stage. Shown here is the temperature
tranferability of MALA models.
- This publication shows how MALA models can be employed across temperature
ranges. It is demonstrated how such models account for both ionic and
electronic temperature effects of materials.



Expand Down
2 changes: 2 additions & 0 deletions docs/source/install/installing_lammps.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _lammpsinstallation:

Installing LAMMPS
==================

Expand Down
21 changes: 13 additions & 8 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,30 @@ Installation
As a software package, MALA consists of three parts:

1. The actual Python package ``mala``, which this documentation accompanies
2. The `LAMMPS <https://www.lammps.org/>`_ code, which is used by MALA to
encode atomic structures on the real-space grid
3. The `Quantum ESPRESSO <https://www.quantum-espresso.org/>`_ (QE) code, which
2. The `Quantum ESPRESSO <https://www.quantum-espresso.org/>`_ (QE) code, which
is used by MALA to post-process the LDOS into total free energies (via the
so called "total energy module")
3. The `LAMMPS <https://www.lammps.org/>`_ code, which is used by MALA to
encode atomic structures on the real-space grid (optional, but highly
recommended!)

All three parts require separate installations. The most important one is
the first one, i.e., the Python library, and you can access a lot of MALA
functionalities by just installing the MALA Python library, especially when
working with precalculated input and output data (e.g. for model training).

For access to all feature, you will have to furthermore install the LAMMPS
and QE codes and associated Python bindings. The installation has been tested
on Linux (Ubuntu/CentOS), Windows and macOS. The individual installation steps
are given in:
For access to all feature, you will have to furthermore install the QE code.
The calculations performed by LAMMPS are also implemented in the python part
of MALA. For small test calculations and development tasks, you therefore do
not need LAMMPS. For realistic simulations the python implementation is not
efficient enough, and you have to use LAMMPS.

The installation has been tested on Linux (Ubuntu/CentOS), Windows and macOS.
The individual installation steps are given in:

.. toctree::
:maxdepth: 1

install/installing_mala
install/installing_lammps
install/installing_qe
install/installing_lammps
1 change: 1 addition & 0 deletions install/mala_cpu_base_environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ dependencies:
- pytorch-cpu
- mpmath
- tensorboard
- scikit-spatial
1 change: 1 addition & 0 deletions install/mala_cpu_environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ dependencies:
- requests-oauthlib=1.3.1
- rsa=4.9
- scipy=1.8.1
- scikit-spatial=6.8.1
- setuptools=59.8.0
- six=1.16.0
- sleef=3.5.1
Expand Down
24 changes: 21 additions & 3 deletions mala/common/parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def __init__(self,):
super(ParametersBase, self).__init__()
self._configuration = {"gpu": False, "horovod": False, "mpi": False,
"device": "cpu", "openpmd_configuration": {},
"openpmd_granularity": 1}
"openpmd_granularity": 1, "lammps": True}
pass

def show(self, indent=""):
Expand Down Expand Up @@ -71,6 +71,9 @@ def _update_openpmd_configuration(self, new_openpmd):
def _update_openpmd_granularity(self, new_granularity):
self._configuration["openpmd_granularity"] = new_granularity

def _update_lammps(self, new_lammps):
self._configuration["lammps"] = new_lammps

@staticmethod
def _member_to_json(member):
if isinstance(member, (int, float, type(None), str)):
Expand Down Expand Up @@ -1180,6 +1183,7 @@ def __init__(self):
# TODO: Maybe as a percentage? Feature dimensions can be quite
# different.
self.openpmd_granularity = 1
self.use_lammps = True

@property
def openpmd_granularity(self):
Expand Down Expand Up @@ -1307,6 +1311,7 @@ def use_mpi(self):
@use_mpi.setter
def use_mpi(self, value):
set_mpi_status(value)

# Invalidate, will be updated in setter.
self.device = None
self._use_mpi = value
Expand All @@ -1331,15 +1336,28 @@ def openpmd_configuration(self):
@openpmd_configuration.setter
def openpmd_configuration(self, value):
self._openpmd_configuration = value

# Invalidate, will be updated in setter.
self.network._update_openpmd_configuration(self.openpmd_configuration)
self.descriptors._update_openpmd_configuration(self.openpmd_configuration)
self.targets._update_openpmd_configuration(self.openpmd_configuration)
self.data._update_openpmd_configuration(self.openpmd_configuration)
self.running._update_openpmd_configuration(self.openpmd_configuration)
self.hyperparameters._update_openpmd_configuration(self.openpmd_configuration)

@property
def use_lammps(self):
"""Control whether or not to use LAMMPS for descriptor calculation."""
return self._use_lammps

@use_lammps.setter
def use_lammps(self, value):
self._use_lammps = value
self.network._update_lammps(self.use_lammps)
self.descriptors._update_lammps(self.use_lammps)
self.targets._update_lammps(self.use_lammps)
self.data._update_lammps(self.use_lammps)
self.running._update_lammps(self.use_lammps)
self.hyperparameters._update_lammps(self.use_lammps)

def show(self):
"""Print name and values of all attributes of this object."""
printout("--- " + self.__doc__.split("\n")[1] + " ---",
Expand Down
110 changes: 96 additions & 14 deletions mala/descriptors/atomic_density.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@
except ModuleNotFoundError:
pass
import numpy as np
from scipy.spatial import distance

from mala.descriptors.lammps_utils import set_cmdlinevars, extract_compute_np
from mala.common.parallelizer import printout
from mala.descriptors.lammps_utils import extract_compute_np
from mala.descriptors.descriptor import Descriptor

# Empirical value for the Gaussian descriptor width, determined for an
Expand Down Expand Up @@ -117,28 +119,37 @@ def get_optimal_sigma(voxel):
return (np.max(voxel) / reference_grid_spacing_aluminium) * \
optimal_sigma_aluminium

def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
def _calculate(self, outdir, **kwargs):
if self.parameters._configuration["lammps"]:
try:
from lammps import lammps
except ModuleNotFoundError:
printout("No LAMMPS found for descriptor calculation, "
"falling back to python.")
return self.__calculate_python(**kwargs)

return self.__calculate_lammps(outdir, **kwargs)
else:
return self.__calculate_python(**kwargs)

def __calculate_lammps(self, outdir, **kwargs):
"""Perform actual Gaussian descriptor calculation."""
use_fp64 = kwargs.get("use_fp64", False)
return_directly = kwargs.get("return_directly", False)

lammps_format = "lammps-data"
ase_out_path = os.path.join(outdir, "lammps_input.tmp")
ase.io.write(ase_out_path, atoms, format=lammps_format)
ase.io.write(ase_out_path, self.atoms, format=lammps_format)

nx = grid_dimensions[0]
ny = grid_dimensions[1]
nz = grid_dimensions[2]
nx = self.grid_dimensions[0]
ny = self.grid_dimensions[1]
nz = self.grid_dimensions[2]

# Check if we have to determine the optimal sigma value.
if self.parameters.atomic_density_sigma is None:
self.grid_dimensions = [nx, ny, nz]
voxel = atoms.cell.copy()
voxel[0] = voxel[0] / (self.grid_dimensions[0])
voxel[1] = voxel[1] / (self.grid_dimensions[1])
voxel[2] = voxel[2] / (self.grid_dimensions[2])
self.parameters.atomic_density_sigma = self.\
get_optimal_sigma(voxel)
get_optimal_sigma(self.voxel)

# Create LAMMPS instance.
lammps_dict = {}
Expand Down Expand Up @@ -197,9 +208,9 @@ def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
# and thus have to properly reorder it.
# We have to switch from x fastest to z fastest reordering.
gaussian_descriptors_np = \
gaussian_descriptors_np.reshape((grid_dimensions[2],
grid_dimensions[1],
grid_dimensions[0],
gaussian_descriptors_np.reshape((self.grid_dimensions[2],
self.grid_dimensions[1],
self.grid_dimensions[0],
7))
gaussian_descriptors_np = \
gaussian_descriptors_np.transpose([2, 1, 0, 3])
Expand All @@ -212,3 +223,74 @@ def _calculate(self, atoms, outdir, grid_dimensions, **kwargs):
return gaussian_descriptors_np[:, :, :, 6:], \
nx*ny*nz

def __calculate_python(self, **kwargs):
"""
Perform Gaussian descriptor calculation using python.

The code used to this end was adapted from the LAMMPS implementation.
It serves as a fallback option whereever LAMMPS is not available.
This may be useful, e.g., to students or people getting started with
MALA who just want to look around. It is not intended for production
calculations.
Compared to the LAMMPS implementation, this implementation has quite a
few limitations. Namely

- It is roughly an order of magnitude slower for small systems
and doesn't scale too great
- It only works for ONE chemical element
- It has no MPI or GPU support
"""
printout("Using python for descriptor calculation. "
"The resulting calculation will be slow for "
"large systems.")

gaussian_descriptors_np = np.zeros((self.grid_dimensions[0],
self.grid_dimensions[1],
self.grid_dimensions[2], 4),
dtype=np.float64)

# Construct the hyperparameters to calculate the Gaussians.
# This follows the implementation in the LAMMPS code.
if self.parameters.atomic_density_sigma is None:
self.parameters.atomic_density_sigma = self.\
get_optimal_sigma(self.voxel)
cutoff_squared = self.parameters.atomic_density_cutoff * \
self.parameters.atomic_density_cutoff
prefactor = 1.0 / (np.power(self.parameters.atomic_density_sigma *
np.sqrt(2*np.pi),3))
argumentfactor = 1.0 / (2.0 * self.parameters.atomic_density_sigma *
self.parameters.atomic_density_sigma)

# Create a list of all potentially relevant atoms.
all_atoms = self._setup_atom_list()

# I think this nested for-loop could probably be optimized if instead
# the density matrix is used on the entire grid. That would be VERY
# memory-intensive. Since the goal of such an optimization would be
# to use this implementation at potentially larger length-scales,
# one would have to investigate that this is OK memory-wise.
# I haven't optimized it yet for the smaller scales since there
# the performance was already good enough.
for i in range(0, self.grid_dimensions[0]):
for j in range(0, self.grid_dimensions[1]):
for k in range(0, self.grid_dimensions[2]):
# Compute the grid.
gaussian_descriptors_np[i, j, k, 0:3] = \
self._grid_to_coord([i, j, k])

# Compute the Gaussian descriptors.
dm = np.squeeze(distance.cdist(
[gaussian_descriptors_np[i, j, k, 0:3]],
all_atoms))
dm = dm*dm
dm_cutoff = dm[np.argwhere(dm < cutoff_squared)]
gaussian_descriptors_np[i, j, k, 3] += \
np.sum(prefactor*np.exp(-dm_cutoff*argumentfactor))

if self.parameters.descriptors_contain_xyz:
self.fingerprint_length = 4
return gaussian_descriptors_np, np.prod(self.grid_dimensions)
else:
self.fingerprint_length = 1
return gaussian_descriptors_np[:, :, :, 3:], \
np.prod(self.grid_dimensions)
Loading
Loading