Skip to content

Commit

Permalink
Merge branch 'refs/heads/develop_lenz' into merge_kyles_active_learning
Browse files Browse the repository at this point in the history
# Conflicts:
#	mala/common/parameters.py
#	mala/common/physical_data.py
#	mala/datahandling/data_handler.py
#	mala/datahandling/snapshot.py
#	mala/interfaces/ase_calculator.py
  • Loading branch information
RandomDefaultUser committed Jan 6, 2025
2 parents a52a7f3 + 13146e1 commit fdb9e93
Show file tree
Hide file tree
Showing 85 changed files with 3,797 additions and 2,200 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.2.1
current_version = 1.3.0
commit = True
tag = True
sign_tags = True
Expand Down
11 changes: 9 additions & 2 deletions .github/workflows/cpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -170,20 +170,27 @@ jobs:
# install mala package
pip --no-cache-dir install -e .[opt,test] --no-build-isolation
- name: Check if Conda environment meets the specified requirements
shell: 'bash -c "docker exec -i mala-cpu bash < {0}"'
run: |
# export Conda environment _with_ mala package installed in it (and extra dependencies)
conda env export -n mala-cpu > env_after.yml
# This command is necessary because conda includes even editable
# packages in an export, at least in the versions we recently used.
# That of course leads to the diff failing, since MALA can never
# be there before it has been installed.
sed -i '/materials-learning-algorithms/d' ./env_after.yml
# if comparison fails, `install/mala_cpu_[base]_environment.yml` needs to be aligned with
# `requirements.txt` and/or extra dependencies are missing in the Docker Conda environment
if diff --brief env_before.yml env_after.yml
then
echo "Files env_before.yml and env_after.yml do not differ."
else
diff --side-by-side --color-always env_before.yml env_after.yml
diff --side-by-side env_before.yml env_after.yml
fi
- name: Download test data repository from RODARE
Expand Down Expand Up @@ -223,7 +230,7 @@ jobs:
- name: Test mala
shell: 'bash -c "docker exec -i mala-cpu bash < {0}"'
run: MALA_DATA_REPO=$(pwd)/mala_data pytest -m "not examples" --disable-warnings
run: MALA_DATA_REPO=$(pwd)/mala_data pytest --cov=mala --cov-fail-under=60 -m "not examples" --disable-warnings

retag-docker-image-cpu:
needs: [cpu-tests, build-docker-image-cpu]
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
python-version: '3.10.4'

- name: Upgrade pip
run: python3 -m pip install --upgrade pip
Expand All @@ -36,7 +36,7 @@ jobs:

- name: Check docstrings
# Ignoring the cached_properties because pydocstyle (sometimes?) treats them as functions.
run: pydocstyle --convention=numpy --ignore-decorators=[cached_property,property] mala
run: pydocstyle --convention=numpy mala

build-and-deploy-pages:
needs: test-docstrings
Expand All @@ -50,7 +50,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
python-version: '3.10.4'

- name: Upgrade pip
run: python3 -m pip install --upgrade pip
Expand Down
6 changes: 3 additions & 3 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# YAML 1.2
cff-version: 1.2.1
cff-version: 1.3.0
message: "If you use this software, please cite it using these metadata."
authors:
- affiliation: "Center for Advanced Systems Understanding (CASUS), Helmholtz-Zentrum Dresden-Rossendorf e.V. (HZDR)"
Expand Down Expand Up @@ -83,12 +83,12 @@ authors:
given-names: D. Jon


date-released: 2024-02-01
date-released: 2024-12-05
keywords:
- "machine-learning"
- "dft"
license: "BSD-3-Clause"
repository-code: "https://github.com/mala-project/mala"
title: MALA
doi: 10.5281/zenodo.5557254 # This DOI represents all versions, and will always resolve to the latest one.
version: 1.2.1
version: 1.3.0
2 changes: 1 addition & 1 deletion Copyright.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
************************************************************************

MALA v. 1.2.1
MALA v. 1.3.0

Under the terms of Contract DE-NA0003525 with NTESS,
the U.S. Government retains certain rights in this software.
Expand Down
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ RUN conda env create -f mala_${DEVICE}_environment.yml && rm -rf /opt/conda/pkgs
# Install optional MALA dependencies into Conda environment with pip
RUN /opt/conda/envs/mala-${DEVICE}/bin/pip install --no-input --no-cache-dir \
pytest \
pytest-cov \
oapackage==2.6.8 \
pqkmeans

Expand Down
6 changes: 4 additions & 2 deletions docs/source/advanced_usage/predictions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,11 +81,13 @@ Gaussian representation of atomic positions. In this algorithm, most of the
computational overhead of the total energy calculation is offloaded to the
computation of this Gaussian representation. This calculation is realized via
LAMMPS and can therefore be GPU accelerated (parallelized) in the same fashion
as the bispectrum descriptor calculation. Simply activate this option via
as the bispectrum descriptor calculation. If a GPU is activated (and LAMMPS
is available), this option will be used by default. It can also manually be
activated via

.. code-block:: python
parameters.descriptors.use_atomic_density_energy_formula = True
parameters.use_atomic_density_formula = True
The Gaussian representation algorithm is describe in
the publication `Predicting electronic structures at any length scale with machine learning <doi.org/10.1038/s41524-023-01070-z>`_.
Expand Down
61 changes: 52 additions & 9 deletions docs/source/advanced_usage/trainingmodel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,22 +194,64 @@ keyword, you can fine-tune the number of new snapshots being created.
By default, the same number of snapshots as had been provided will be created
(if possible).

Using tensorboard
******************
Logging metrics during training
*******************************

Training progress in MALA can be visualized via tensorboard or wandb, as also shown
in the file ``advanced/ex03_tensor_board``. Simply select a logger prior to training as

.. code-block:: python
parameters.running.logger = "tensorboard"
parameters.running.logging_dir = "mala_vis"
Training routines in MALA can be visualized via tensorboard, as also shown
in the file ``advanced/ex03_tensor_board``. Simply enable tensorboard
visualization prior to training via
or

.. code-block:: python
# 0: No visualizatuon, 1: loss and learning rate, 2: like 1,
# but additionally weights and biases are saved
parameters.running.logging = 1
import wandb
wandb.init(
project="mala_training",
entity="your_wandb_entity"
)
parameters.running.logger = "wandb"
parameters.running.logging_dir = "mala_vis"
where ``logging_dir`` specifies some directory in which to save the
MALA logging data. Afterwards, you can run the training without any
MALA logging data. You can also select which metrics to record via

.. code-block:: python
parameters.validation_metrics = ["ldos", "dos", "density", "total_energy"]
Full list of available metrics:
- "ldos": MSE of the LDOS.
- "band_energy": Band energy.
- "band_energy_actual_fe": Band energy computed with ground truth Fermi energy.
- "total_energy": Total energy.
- "total_energy_actual_fe": Total energy computed with ground truth Fermi energy.
- "fermi_energy": Fermi energy.
- "density": Electron density.
- "density_relative": Rlectron density (Mean Absolute Percentage Error).
- "dos": Density of states.
- "dos_relative": Density of states (Mean Absolute Percentage Error).

To save time and resources you can specify the logging interval via

.. code-block:: python
parameters.running.validate_every_n_epochs = 10
If you want to monitor the degree to which the model overfits to the training data,
you can use the option

.. code-block:: python
parameters.running.validate_on_training_data = True
MALA will evaluate the validation metrics on the training set as well as the validation set.

Afterwards, you can run the training without any
other modifications. Once training is finished (or during training, in case
you want to use tensorboard to monitor progress), you can launch tensorboard
via
Expand All @@ -221,6 +263,7 @@ via
The full path for ``path_to_log_directory`` can be accessed via
``trainer.full_logging_path``.

If you're using wandb, you can monitor the training progress on the wandb website.

Training in parallel
********************
Expand Down
15 changes: 9 additions & 6 deletions docs/source/basic_usage/trainingmodel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ options to train a simple network with example data, namely
parameters = mala.Parameters()
parameters.data.input_rescaling_type = "feature-wise-standard"
parameters.data.output_rescaling_type = "normal"
parameters.data.output_rescaling_type = "minmax"
parameters.network.layer_activations = ["ReLU"]
Expand All @@ -43,15 +43,18 @@ sub-objects dealing with the individual aspects of the workflow. In the first
two lines, which data scaling MALA should employ. Scaling data greatly
improves the performance of NN based ML models. Options are

* ``None``: No normalization is applied.
* ``None``: No scaling is applied.

* ``standard``: Standardization (Scale to mean 0, standard deviation 1)
* ``standard``: Standardization (Scale to mean 0, standard deviation 1) is
applied to the entire array.

* ``normal``: Min-Max scaling (Scale to be in range 0...1)
* ``minmax``: Min-Max scaling (Scale to be in range 0...1) is applied to the entire array.

* ``feature-wise-standard``: Row Standardization (Scale to mean 0, standard deviation 1)
* ``feature-wise-standard``: Standardization (Scale to mean 0, standard
deviation 1) is applied to each feature dimension individually.

* ``feature-wise-normal``: Row Min-Max scaling (Scale to be in range 0...1)
* ``feature-wise-minmax``: Min-Max scaling (Scale to be in range 0...1) is
applied to each feature dimension individually.

Here, we specify that MALA should standardize the input (=descriptors)
by feature (i.e., each entry of the vector separately on the grid) and
Expand Down
1 change: 0 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,6 @@
"scipy",
"oapackage",
"matplotlib",
"horovod",
"lammps",
"total_energy",
"pqkmeans",
Expand Down
4 changes: 2 additions & 2 deletions docs/source/install/installing_lammps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ be used with MALA. For a full overview of how to build LAMMPS, please refer to
the `official instructions <https://docs.lammps.org/Build.html>`_.
The MALA team recommends to build LAMMPS with ``cmake``. To do so

* Checkout https://github.com/mala-project/lammps/tree/mala
* Make sure the ``mala`` tree is checked out locally via ``git branch``!
* Checkout https://github.com/mala-project/lammps/tree/mala_v130
* Make sure the ``mala_v130`` branch is checked out locally via ``git branch``!
* Inside the LAMMPS folder create a build folder (named, e.g., ``build``)
* In the ``build`` folder, configure your ``cmake`` build:
``cmake ../cmake -D OPTION1 -D OPTION2 ...``; Options for a typical LAMMPS
Expand Down
4 changes: 2 additions & 2 deletions docs/source/install/installing_mala.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ Installing MALA
Prerequisites
**************

MALA does not depend on a specific Python version. The most recent Python
version it has been tested with successfully is Python ``3.10.4``.
MALA supports any Python version starting from ``3.10.4``. No upper limit on
Python versions are enforced. The most recent *tested* version is ``3.10.12``.

MALA requires ``torch`` in order to function. As the installation of torch
depends highly on the architecture you are using, ``torch`` will not
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/ex01_checkpoint_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def initial_setup():
parameters = mala.Parameters()
parameters.data.data_splitting_type = "by_snapshot"
parameters.data.input_rescaling_type = "feature-wise-standard"
parameters.data.output_rescaling_type = "normal"
parameters.data.output_rescaling_type = "minmax"
parameters.network.layer_activations = ["ReLU"]
parameters.running.max_number_epochs = 9
parameters.running.mini_batch_size = 8
Expand Down
14 changes: 11 additions & 3 deletions examples/advanced/ex03_tensor_board.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

parameters = mala.Parameters()
parameters.data.input_rescaling_type = "feature-wise-standard"
parameters.data.output_rescaling_type = "normal"
parameters.data.output_rescaling_type = "minmax"
parameters.targets.ldos_gridsize = 11
parameters.targets.ldos_gridspacing_ev = 2.5
parameters.targets.ldos_gridoffset_ev = -5
Expand All @@ -32,11 +32,19 @@

data_handler = mala.DataHandler(parameters)
data_handler.add_snapshot(
"Be_snapshot0.in.npy", data_path, "Be_snapshot0.out.npy", data_path, "tr",
"Be_snapshot0.in.npy",
data_path,
"Be_snapshot0.out.npy",
data_path,
"tr",
calculation_output_file=os.path.join(data_path, "Be_snapshot0.out"),
)
data_handler.add_snapshot(
"Be_snapshot1.in.npy", data_path, "Be_snapshot1.out.npy", data_path, "va",
"Be_snapshot1.in.npy",
data_path,
"Be_snapshot1.out.npy",
data_path,
"va",
calculation_output_file=os.path.join(data_path, "Be_snapshot1.out"),
)
data_handler.prepare_data()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
def initial_setup():
parameters = mala.Parameters()
parameters.data.input_rescaling_type = "feature-wise-standard"
parameters.data.output_rescaling_type = "normal"
parameters.data.output_rescaling_type = "minmax"
parameters.running.max_number_epochs = 10
parameters.running.mini_batch_size = 40
parameters.running.learning_rate = 0.00001
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
parameters = mala.Parameters()
# Specify the data scaling.
parameters.data.input_rescaling_type = "feature-wise-standard"
parameters.data.output_rescaling_type = "normal"
parameters.data.output_rescaling_type = "minmax"
parameters.running.max_number_epochs = 5
parameters.running.mini_batch_size = 40
parameters.running.learning_rate = 0.00001
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def optimize_hyperparameters(hyper_optimizer):

parameters = mala.Parameters()
parameters.data.input_rescaling_type = "feature-wise-standard"
parameters.data.output_rescaling_type = "normal"
parameters.data.output_rescaling_type = "minmax"
parameters.running.max_number_epochs = 10
parameters.running.mini_batch_size = 40
parameters.running.learning_rate = 0.00001
Expand Down
Loading

0 comments on commit fdb9e93

Please sign in to comment.