diff --git a/404.html b/404.html index ccb9e55..101bb90 100644 --- a/404.html +++ b/404.html @@ -318,6 +318,26 @@ +
  • + + + + + free energy calculations + + + + +
  • + + + + + + + + +
  • diff --git a/api_example.png b/api_example.png new file mode 100644 index 0000000..5eb2fca Binary files /dev/null and b/api_example.png differ diff --git a/configuration/index.html b/configuration/index.html index 2fc3fd2..5d53c1b 100644 --- a/configuration/index.html +++ b/configuration/index.html @@ -9,7 +9,7 @@ - + @@ -323,6 +323,26 @@ + + +
  • + + + + + free energy calculations + + + + +
  • + + + + + + + @@ -613,19 +633,17 @@

    1. ML potential training

    SLURM jobscripts, send them to the scheduler, and once the resources are allocated, start the calculation. For example, assume that the GPU partition on this cluster is named infinite_a100, and it has 12 cores per GPU. Consider the following config -
    container_runtime: apptainer   # or singularity; check HPC docs to see which one is available
    -container_uri: oras://ghcr.io/molmod/psiflow:main_cu118  # built from github main branch
    -ModelTraining:
    -  cores_per_worker: 12
    -  gpu: true
    -  slurm:
    -    partition: "infinite_a100"
    -    account: "112358"
    -    nodes_per_block: 1
    -    cores_per_node: 24
    -    max_blocks: 1
    -    walltime: "12:00:00"
    -    scheduler_options: "#SBATCH --gpus=2"
    +
    ModelTraining:
    +  cores_per_worker: 12
    +  gpu: true
    +  slurm:
    +    partition: "infinite_a100"
    +    account: "112358"
    +    nodes_per_block: 1
    +    cores_per_node: 24
    +    max_blocks: 1
    +    walltime: "12:00:00"
    +    scheduler_options: "#SBATCH --gpus=2"
     
    The top-level keyword ModelTraining indicates that we're defining the execution of model.train(). It has a number of special keywords:

    @@ -659,7 +677,7 @@

    1. ML potential training

    achieve optimal training performance. For example, on some clusters, it is necessary to tune the process/thread affinity a little bit. For example:
    env_vars:
    -  OMP_PROC_BIND: spread
    +  OMP_PROC_BIND: "spread"
     

    2. molecular dynamics

    diff --git a/data/index.html b/data/index.html index c5fad19..4326097 100644 --- a/data/index.html +++ b/data/index.html @@ -390,6 +390,26 @@ +
  • + + + + + free energy calculations + + + + +
  • + + + + + + + + +
  • diff --git a/free_energy/index.html b/free_energy/index.html new file mode 100644 index 0000000..3a1b10e --- /dev/null +++ b/free_energy/index.html @@ -0,0 +1,445 @@ + + + + + + + + + + + + + + + + + + + + + + + free energy calculations - psiflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + +
    +
    + +
    + + + + + + +
    + + +
    + +
    + + + + + + +
    +
    + + + +
    +
    +
    + + + + + + + +
    +
    +
    + + + + +
    +
    + + + + + + + +

    free energy calculations

    + +

    TODO

    + + + + + + + + + + + + + +
    +
    + + + +
    + +
    + + + +
    +
    +
    +
    + + + + + + + + + + + + + + \ No newline at end of file diff --git a/hamiltonian/index.html b/hamiltonian/index.html index d204ba2..f7fa306 100644 --- a/hamiltonian/index.html +++ b/hamiltonian/index.html @@ -330,6 +330,26 @@ +
  • + + + + + free energy calculations + + + + +
  • + + + + + + + + +
  • diff --git a/index.html b/index.html index b08e1ff..5b3a03e 100644 --- a/index.html +++ b/index.html @@ -337,6 +337,26 @@ +
  • + + + + + free energy calculations + + + + +
  • + + + + + + + + +
  • @@ -385,42 +405,61 @@

    psiflow - scalab

    Users may define arbitrarily complex workflows and execute them automatically on local, HPC, and/or cloud infrastructure. To achieve this, psiflow is built using Parsl: a parallel execution library which manages job submission and workload distribution. As such, psiflow can orchestrate large molecular simulation pipelines on hundreds or even thousands of nodes.

    +
    + Image title +

    -

    Use the following one-liner to create a lightweight micromamba Python environment with all dependencies readily available: -

    curl -L molmod.github.io/psiflow/install.sh | bash
    -
    -The environment can be activated by sourcing the activate.sh file which will be created in the current working directory.

    -

    Next, create a config.yaml file which defines the compute resources. For SLURM-based HPC systems, psiflow can initialize your configuration automatically via the following command: -

    python -c 'import psiflow; psiflow.setup_slurm_config()'
    -
    -Example configuration files for LUMI, MeluXina, or VSC can be found here. -No additional software compilation is required since all of the heavy lifting (CP2K/ORCA/GPAW, PyTorch model training, i-PI dynamics) is executed within preconfigured Apptainer/Singularity containers which are production-ready for most HPCs.

    -

    For a complete overview of all execution options, see the configuration page.

    -

    Examples

    -
      -
    • Replica exchange molecular dynamics | alanine dipeptide: replica exchange molecular dynamics simulation of alanine dipeptide, using the MACE-MP0 universal potential. - The inclusion of high-temperature replicas allows for fast conformational transitions and improves ergodicity.
    • -
    • Geometry optimizations | formic acid dimer: approximate transition state calculation for the proton exchange reaction in a formic acid dimer, - using simple bias potentials and a few geometry optimizations.
    • -
    • -

      Static and dynamic frequency analysis | dihydrogen: Hessian-based estimate of the H-H bond strength and corresponding IR absorption frequency, and a comparison with a dynamical estimate from NVE simulation and Fourier analysis.

      -
    • -
    • -

      Bulk modulus calculation | iron: estimate of the bulk modulus of fcc iron using a series of NPT simulations at different pressures

      -
    • -
    • -

      Solid-state phase stabilities | iron: estimating the relative stability of fcc and bcc iron with anharmonic corrections using thermodynamic integration (see e.g. Phys Rev B., 2018)

      -
    • -
    • -

      DFT singlepoints | water: analysis of the numerical noise DFT energy and force evaluations using CP2K and the RPBE(D3) functional, for a collection of water molecules.

      -
    • -
    • -

      Path-integral molecular dynamics | water: demonstration of the impact of nuclear quantum effects on the variance in O-H distance in liquid water. Path-integral molecular dynamics simulations with increasing number of beads (1, 2, 4, 8, 16) approximate the proton delocalization, and lead to systematically larger variance in O-H distance.

      -
    • -
    • -

      Machine learning potential training | water: simple training and validation script for MACE on a small dataset of water configurations.

      -
    • -
    +

    FAQ

    +

    Where do I start?

    +

    Take a brief look at the examples or walk +through the +documentation to get an idea for psiflow's +capabilities. Next, head over to the setup & configuration section of the docs to get started!

    +

    Is psiflow a workflow manager?

    +

    Absolutely not! Psiflow is a Python library which allows you to perform complex molecular simulations and scale them towards large numbers of compute nodes automatically. +It does not have 'fixed' workflow recipes, it does not require you to set up 'databases' +or 'server daemons'. The only thing it does is expose a concise and powerful API to +perform arbitrarily complex calculations in a highly efficiently manner.

    +

    Is it compatible with my cluster?

    +

    Most likely yes. Check which resource scheduling system your cluster uses (probably either +SLURM/PBSPro/SGE). If you're not sure, ask your system administrators or open an issue

    +

    Can I use VASP with it?

    +

    You cannot automate VASP calculations with it, but in 99% of cases there is either no need +to use VASP, or it's very easy to quickly perform the VASP part manually, outside of psiflow, +and do everything else (data generation, ML potential training, sampling) with psiflow. +Open an issue if you're not sure how to do this.

    +

    I would like to have feature X

    +

    Psiflow is continuously in development; if you're missing a feature feel free to open an +issue or pull request!

    +

    I have a bug. Where is my error message and how do I solve it?

    +

    Psiflow covers essentially all major aspects of computational molecular simulation (most +notably including the executation and parallelization), so there's bound to be some bug +once in a while. Debugging can be challenging, and we recommend to follow the following steps in +order:

    +
      +
    1. Check the stderr/stdout of the main Python process (i.e. the python main.py + config.yaml one). See if there are any clues. If it has contents which you don't + understand, open an issue. If there's seemingly nothing there, go to step 2.
    2. +
    3. Check Parsl's log file. This can be found in the current working directory, under + psiflow_internal/parsl.log. If it's a long file, search for any errors using Error + or ERROR. If you find anything suspicious but do not know how to solve it, + open an issue.
    4. +
    5. Check the output files of individual ML training, QM singlepoints, or i-PI molecular + dynamics runs. These can be found under psiflow_internal/000/task_logs/*. + Again, if you find an error but do not exactly know why it happens or how to solve it, + feel free to open an issue. Most likely, it will be useful to other people as well
    6. +
    7. Check the actual 'jobscripts' that were generated and which were submitted to the + cluster. Quite often, there can be a spelling mistake in e.g. the compute project you + are using, or you are requesting a resource on a partition that is not available. + These jobscripts (and there output and error) can be found under + psiflow_internal/000/submit_scripts/.
    8. +
    +

    Where do these container images come from?

    +

    They were generated using Docker based on the recipes in this repository, and were then +converted to .sif format using apptainer

    +

    Can I run psiflow locally for small runs or debug purposes?

    +

    Of course! If you do not provide a config.yaml, psiflow will just use your local +workstation for its execution. See e.g. this or this config used for testing.

    Citing psiflow

    Psiflow is developed at the @@ -432,501 +471,6 @@

    Examples

    npj Computational Materials, 9, 19 (2023)

    - - - - - diff --git a/install.sh b/install.sh new file mode 100644 index 0000000..9eecfb3 --- /dev/null +++ b/install.sh @@ -0,0 +1,18 @@ +#!/bin/sh + +curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba +export MAMBA_ROOT_PREFIX=$(pwd) # optional, defaults to ~/micromamba + +eval "$(./bin/micromamba shell hook -s posix)" +micromamba activate +micromamba create -n _psiflow_env -y python=3.10 pip ndcctools=7.11.1 -c conda-forge +micromamba activate _psiflow_env +pip install git+https://github.com/molmod/psiflow + +# create activate.sh +echo 'ORIGDIR=$PWD' >>activate.sh # prevent variable substitution +echo "cd $(pwd)" >>activate.sh +echo "export MAMBA_ROOT_PREFIX=$(pwd)" >>activate.sh +echo 'eval "$(./bin/micromamba shell hook -s posix)"' >>activate.sh +echo "micromamba activate _psiflow_env" >>activate.sh +echo 'cd $ORIGDIR' >>activate.sh # prevent variable substitution diff --git a/installation/index.html b/installation/index.html index c74e41e..b7120b4 100644 --- a/installation/index.html +++ b/installation/index.html @@ -323,6 +323,26 @@ +
  • + + + + + free energy calculations + + + + +
  • + + + + + + + + +
  • diff --git a/learning/index.html b/learning/index.html index 4e52808..3021a17 100644 --- a/learning/index.html +++ b/learning/index.html @@ -12,7 +12,7 @@ - + @@ -71,11 +71,6 @@
    - - - Skip to content - -
    @@ -315,17 +310,6 @@ - - @@ -336,95 +320,26 @@ - - -
  • + + + + - - + + free energy calculations + - - + + @@ -469,567 +384,46 @@

    online learning

    -

    The endgame of psiflow is to allow for the seamless development and scalable -execution of online learning algorithms for interatomic -potentials. -The BaseLearning class provides an interface based on which such -algorithms can be implemented, and it has the following characteristics:

    -
      -
    • learning.run(): performs the actual active learning using a BaseModel, a BaseReference, -and a list of BaseWalker instances. Optionally, you can also specify an initial dataset which can be -used to bootstrap the learning (see below).
    • -
    • an output folder: used for storing intermediate models, (labeled) datasets, walkers, and reported metrics.
    • -
    • state identifier: to facilitate logging and/or debugging of the active learning progress, -each successfully labeled state is immediately given a unique identifier (an integer). -This is necessary in order to keep track of which molecular dynamics log or DFT evaluation log -belongs to which state, especially when data is shuffled in each iteration. The identifier is stored -in the info dict of each of the FlowAtoms instances, and is therefore also human-readable in the -dataset XYZ files.
    • -
    • metrics: the Metrics helper class is used to compute and save various error metrics and -other relevant diagnostics during online learning. Examples are per-element validation RMSEs -or collective variables of the sampled data: -
      dataset.log
      +------------+--------+--------+-------+----------+----------+----------+----------+-----------+
      -| identifier | e_rmse | f_rmse |    CV | f_rmse_H | f_rmse_C | f_rmse_N | f_rmse_I | f_rmse_Pb |
      -+------------+--------+--------+-------+----------+----------+----------+----------+-----------+
      -|          0 |   0.23 |  32.15 | -4.54 |    23.82 |    47.04 |    37.72 |    27.97 |     46.47 |
      -|          1 |   0.27 |  31.72 | -4.45 |    23.13 |    43.52 |    34.12 |    28.43 |     52.42 |
      -|          2 |   0.45 |  33.60 | -4.49 |    27.02 |    44.40 |    40.34 |    27.77 |     48.51 |
      -|          3 |   0.39 |  33.02 | -4.44 |    26.52 |    50.11 |    36.97 |    27.50 |     45.21 |
      -|          4 |   0.36 |  31.75 | -4.47 |    25.15 |    41.36 |    37.35 |    27.10 |     47.16 |
      -|          5 |   0.35 |  34.00 | -4.41 |    28.04 |    43.99 |    39.52 |    28.56 |     49.31 |
      -...
      -
      -or the (a posteriori) error of individual walkers and other relevant information: -
      walkers.log
      +--------------+---------+----------+--------+--------------+-------------+------------+-------+--------+-------------------------------------+
      -| walker_index | counter | is_reset | f_rmse | disagreement | temperature | identifier |    CV | e_rmse |                              stdout |
      -+--------------+---------+----------+--------+--------------+-------------+------------+-------+--------+-------------------------------------+
      -|            0 |    1000 |    False |  47.33 |         None |      135.79 |        150 | -4.61 |   4.04 | task_7028_molecular_dynamics_openmm |
      -|            1 |    1000 |    False |  50.69 |         None |      142.89 |        151 | -4.39 |   4.11 | task_7046_molecular_dynamics_openmm |
      -|            2 |    1000 |    False |  46.34 |         None |      140.72 |        152 | -4.61 |   4.07 | task_7064_molecular_dynamics_openmm |
      -|            3 |    1000 |    False |  43.71 |         None |      136.12 |        153 | -4.45 |   4.24 | task_7082_molecular_dynamics_openmm |
      -...
      -
      -Although optional, it also provides a convenient -Weights & Biases interface for easier navigation and interpretation of all of the metrics.
    • -
    • (optional) pretraining: pretraining is used to bootstrap active learning runs. In essence, it makes -the model familiar with the chemical bonds in the system and ensures that it doesn't go too crazy during -sampling in the first few iterations. During pretraining, a minimal set of configurations is generated by applying -random perturbations to the atomic positions and/or unit cell vectors (typically about 0.05 A in magnitude). -These configurations are then evaluated using the provided BaseReference instance after which the obtained -data is split into training and validation in order to pretrain the model. -When learning.run() is called, it decides whether or not to perform pretraining based on whether the - model has already been initialized as well as whether initial data is presented -(i.e. there are four possible scenarios):
        -
      • initialized model, initial data: start with active learning and append generated data to -the initial data;
      • -
      • uninitialized model, initial data: train the model on the initial data before starting -the active learning;
      • -
      • uninitialized model, no initial data: execute pretraining using the atomic geometries stored in the walkers. -The size of the pretraining dataset as well as the magnitude of the perturbations is specified as keyword arguments -to the learning algorithm;
      • -
      • initialized model, no initial data: initialize an empty dataset and start with active learning.
      • -
      -
    • -
    -

    The following keyword arguments are shared between all BaseLearning subclasses

    -
      -
    • path_output : pathlib.Path | str : defines the output directory in which all results are stored
    • -
    • train_valid_split : float = 0.9 : determines the fraction of all data that is used for training (typically 0.9)
    • -
    • pretraining_nstates : int = 50 : size of the pretraining dataset
    • -
    • pretraining_amplitude_pos : float = 0.05 : amplitude of the perturbations (in Angstrom) that is applied on the -positions in order to generate the pretraining dataset
    • -
    • pretraining_amplitude_box : float = 0.05 : amplitude of the perturbations (in Angstrom) that is applied on the -components of the strain tensor in order to generate the pretraining dataset. Only applicable for periodic systems.
    • -
    • metrics: Metrics | None = Metrics() : tracks and saves various metrics in the output folder; optionally logs -them to W&B.
    • -
    • atomic_energies: dict: dictionary of atomic energies which are to be inserted in the model. These can either be -actual floats containing the energy in units of eV, or AppFuture instances which represent a float (as returned -by reference.compute_atomic_energy().
    • -
    • train_from_scratch: bool = True : whether to reinitialize and train models from scratch in each iteration, -or whether to start from the weights of the current model. Usually, it's better to retrain from scratch.
    • -
    • mix_training_validation: bool = True : whether or not to mix training and validation sets in each iteration as -to improve generalization performance. Usually a good idea.
    • -
    • identifier: int = 0 : state identifier to start from when labeling successfully evaluated states. This need only -be modified in more complex setups where multiple learning classes are combined.
    • -
    -

    Sequential Learning

    -

    Sequential learning is arguably the most straightforward approach to online learning for interatomic potentials. -In each iteration, walkers are propagated in phase space using a certain model, -their newly sampled geometries are quantum mechanically evaluated and added to training and validation sets, -and the model is finally retrained on all available data.

    -

    The following keyword arguments are specific to SequentialLearning:

    -
      -
    • niterations: int : the number of active learning iterations to perform. Each iterations starts with phase space -sampling, followed by reference evaluation and model training.
    • -
    • temperature_ramp: Optional[tuple[float, float, float]] = None: (new in v2.0.0) whether to gradually increase the temperature of the walkers -over a certain number of iterations. If not None, this keyword argument expects a tuple of floats: (initial T, final T, nsteps). -If this is None, then the temperature of the walkers is left as-is.
    • -
    • error_thresholds_for_reset: tuple[float, float] = (10, 200) : (new in v2.0.0) determines the (energy, force) -error thresholds for resetting walkers, in meV/atom and meV/angstrom. -After propagation and QM evaluation, the obtained energy and forces are -compared with the model's predicted energy and forces during propagation. A high error implies that the walker -was exploring phase space with a model that was very inaccurate, which makes it very likely that the sampled states -are not very physical. In that case, it makes sense to reset the walker to its initial configuration.
    • -
    -

    Psiflow implements this approach using a SequentialLearning class with the following run() function:

    -

    def run(
    -        self,
    -        model: BaseModel,
    -        reference: BaseReference,
    -        walkers: list[BaseWalker],
    -        initial_data: Optional[Dataset] = None,
    -        ) -> Dataset:
    -    data = self.initialize_run(
    -            model,
    -            reference,
    -            walkers,
    -            initial_data,
    -            )
    -    for i in range(self.niterations):
    -        if self.output_exists(str(i)):
    -            continue # skip iterations in case of restarted run
    -
    -        if i == 0: # set temperature of walkers
    -            self.update_walkers(walkers, initialize=True)
    -
    -        new_data, self.identifier = sample_with_model(
    -                model,
    -                reference,
    -                walkers,
    -                self.identifier,
    -                self.error_thresholds_for_reset,
    -                self.metrics,
    -                )
    -
    -        # if none of the walkers yielded a physically relevant structure
    -        # or none of the reference evaluations succeeded, then new_data will
    -        # essentially be empty at which point the run should be aborted
    -        assert new_data.length().result() > 0, 'no new states were generated!'
    -
    -        # otherwise, add the data and train model.
    -        data = data + new_data
    -        data_train, data_valid = data.split(self.train_valid_split)
    -        if self.train_from_scratch:
    -            logger.info('reinitializing scale/shift/avg_num_neighbors on data_train')
    -            model.reset()
    -            model.initialize(data_train)
    -        model.train(data_train, data_valid)
    -
    -        save_state(
    -                self.path_output,
    -                str(i),
    -                model=model,
    -                walkers=walkers,
    -                data_train=data_train,
    -                data_valid=data_valid,
    -                )
    -        if self.metrics is not None:
    -            self.metrics.save(self.path_output / str(i), model, data)
    -        psiflow.wait()
    -        self.update_walkers(walkers)
    -    return data
    -
    -It implements the following sequence of events:

    -
      -
    • self.initialize_run(...): this checks whether the model contains atomic -energies or whether pretraining needs to be performed etc.
    • -
    • -

      for a specified number of iterations, the following sequence is repeated:

      -
        -
      1. -

        sample_with_model(...): this is a wrapper function that incorporates -most of the active learning logic;

        -
          -
        • walkers are propagated using the current model; -temperature and interatomic distance checks are applied in order -to avoid evaluating unphysical states and reset walkers when necessary;
        • -
        • the states that came through are evaluated using the provided reference -and the post hoc error is evaluated between the model's predicted energy -and force and the actual QM energy and force. If this error is too large, -it indicates that the model was sampling in a region of phase space that was -absolutely not well known, and hence should be reset in order to avoid -explosions or unphysical geometries;
        • -
        • the Metrics instance logs metadata regarding walker propagations -(which might include average temperatures, sampled collective variable -ranges, post hoc walker errors, possible resets, etc) which is saved to -.csv files and potentially also logged to Weights & Biases;
        • -
        • all successfully evaluated atomic configurations are returned as new_data.
        • -
        -
      2. -
      3. -

        model.train() is called on the total dataset, which includes the newly sampled -states.

        -
      4. -
      5. All objects (walkers, datasets, model) are saved in the output folder after -training such that the run can be restarted from here. -If the temperature ramp is enabled, this is also the point where walker temperatures -are increased towards \(T_{\textsf{final}}\) in preparation for the next iteration.
      6. -
      -
    • -
    • -

      after completing all iterations, the function completes by returning the final dataset.

      -
    • -
    -

    Example 1: heterogeneous catalysis

    -

    Let us illustrate how one might set up a sequential learning run based on a real-world -example from Nature Communcations: -a proton hopping reaction in a zeolite framework. -This is an elementary reaction in which a hydrogen jumps between two of the oxygens -in the first coordination sphere of an aluminum substitution (see Figure 1 in the paper). -Because this is an activated event, gathering training data requires some form of -enhanced sampling. In this example, we will use metadynamics.

    -
    -import statements and helper functions -
    import requests
    -import logging
    -from pathlib import Path
    -import numpy as np
    -
    -from ase.io import read
    -
    -import psiflow
    -from psiflow.learning import SequentialLearning, load_learning
    -from psiflow.models import MACEModel, MACEConfig
    -from psiflow.reference import CP2KReference
    -from psiflow.data import FlowAtoms, Dataset
    -from psiflow.walkers import BiasedDynamicWalker, PlumedBias
    -from psiflow.state import load_state
    -from psiflow.metrics import Metrics
    -
    -

    The get_bias() helper function defines the metadynamics bias settings -that are used during the phase space exploration by the walkers.

    -

    def get_bias():
    -    plumed_input = """
    -UNITS LENGTH=A ENERGY=kj/mol TIME=fs
    -
    -coord1: COORDINATION GROUPA=109 GROUPB=88 R_0=1.4
    -coord2: COORDINATION GROUPA=109 GROUPB=53 R_0=1.4
    -CV: MATHEVAL ARG=coord1,coord2 FUNC=x-y PERIODIC=NO
    -cv2: MATHEVAL ARG=coord1,coord2 FUNC=x+y PERIODIC=NO
    -lwall: LOWER_WALLS ARG=cv2 AT=0.65 KAPPA=5000.0
    -METAD ARG=CV SIGMA=0.2 HEIGHT=5 PACE=100
    -"""
    -    return PlumedBias(plumed_input)
    -
    -The get_reference() helper function defines a generic PBE-D3/TZVP reference level of theory. -Basis set, pseudopotentials, and D3 correction parameters are obtained from -the official CP2K repository and saved in the internal directory of -psiflow. The input file is assumed to be available locally.

    -
    def get_reference():
    -    with open(Path.cwd() / 'data' / 'cp2k_input.txt', 'r') as f:
    -        cp2k_input = f.read()
    -    reference = CP2KReference(cp2k_input=cp2k_input)
    -    basis     = requests.get('https://raw.githubusercontent.com/cp2k/cp2k/v2023.1/data/BASIS_MOLOPT_UZH').text
    -    dftd3     = requests.get('https://raw.githubusercontent.com/cp2k/cp2k/v2023.1/data/dftd3.dat').text
    -    potential = requests.get('https://raw.githubusercontent.com/cp2k/cp2k/v2023.1/data/POTENTIAL_UZH').text
    -    cp2k_data = {
    -            'basis_set': basis,
    -            'potential': potential,
    -            'dftd3': dftd3,
    -            }
    -    for key, value in cp2k_data.items():
    -        with open(psiflow.context().path / key, 'w') as f:
    -            f.write(value)
    -        reference.add_file(key, psiflow.context().path / key)
    -    return reference
    -
    -
    -

    zeolite_reaction.py
    def main(path_output):
    -    assert not path_output.exists()
    -    reference = get_reference()
    -
    -    atoms = FlowAtoms.from_atoms(read(Path.cwd() / 'data' / 'zeolite_proton.xyz'))
    -    atoms.canonical_orientation()   # transform into conventional lower-triangular box
    -
    -    config = MACEConfig()
    -    config.r_max = 6.0
    -    config.num_channels = 32
    -    config.max_L = 1
    -    model = MACEModel(config)
    -
    -    model.add_atomic_energy('H', reference.compute_atomic_energy('H', box_size=6))
    -    model.add_atomic_energy('O', reference.compute_atomic_energy('O', box_size=6))
    -    model.add_atomic_energy('Si', reference.compute_atomic_energy('Si', box_size=6))
    -    model.add_atomic_energy('Al', reference.compute_atomic_energy('Al', box_size=6))
    -
    -    # set learning parameters
    -    learning = SequentialLearning(
    -            path_output=path_output,
    -            niterations=10,
    -            train_valid_split=0.9,
    -            train_from_scratch=True,
    -            metrics=Metrics('zeolite_reaction', 'psiflow_examples'),
    -            error_thresholds_for_reset=(10, 200), # (meV/atom, meV/angstrom)
    -            temperature_ramp=(300, 1000, 5) # logarithmic increase from 300 K to 1000 K
    -            )
    -
    -    # construct walkers; biased MTD in this case
    -    bias    = get_bias()
    -    walkers = BiasedDynamicWalker.multiply(
    -            50,
    -            data_start=Dataset([atoms]),
    -            bias=bias,
    -            timestep=0.5,
    -            steps=5000,
    -            step=50,
    -            start=0,
    -            temperature=300,
    -            temperature_threshold=3, # reset if T > T_0 + 3 * sigma
    -            pressure=0,
    -            )
    -    data = learning.run(
    -            model=model,
    -            reference=reference,
    -            walkers=walkers,
    -            )
    -
    -
    -if __name__ == '__main__':
    -    psiflow.load()
    -    main(Path.cwd() / 'output')
    -    psiflow.wait()
    -
    -The script begins with a definition of the key components of the workflow:

    -
      -
    • a BaseReference instance to perform QM evaluations, in this case CP2K (see the helper function for more details);
    • -
    • a BaseModel instance to learn energies and forces, in this case MACE. The MACEConfig dataclass allows the user to -specify the precise hyperparameters;
    • -
    • the SequentialLearning instance, which the defines the hyperparameters of the active -learning workflow.
    • -
    • a PlumedBias which represents the metadynamics bias potential, and afterwards a set -of 50 BiasedDynamicWalker instances which are going to perform the actual sampling. -Each of the walkers has its own metadynamics potential as to maximally differentiate -the sampling distributions.
    • -
    -

    Before the learning begins, the atomic energies of each of the elements in the system -are computed by the reference and inserted into the model. Because no initial data is -passed into learning.run() and the model is not yet initialized on existing data, -pretraining will be performed based on random perturbations before the actual metadynamics -exploration is started.

    -

    Incremental Learning

    -

    While metadynamics is an easy approach to accelerate the sampling of reactive events, it behaves rather chaotically -and may generally undersample transition states while still oversampling free energy minima. -A more controlled and uniform sampling of the entire reaction pathway is possible using moving harmonic restraints. -At the start, the walkers begin their exploration at some initial value of the collective variable. -By applying a moving harmonic restraint on the collective variable, we can force the system to gradually explore -the transition path in a uniform and controlled manner. -To do this securely, the moving restraint does not immediately force the system to traverse the entire path. -Instead, the moving restraint forces the walkers to explore a small fraction of the path in each iteration, -until eventually the entire path is explored.

    -

    The IncrementalLearning class implements such procedure based on the following additional keyword arguments:

    -
      -
    • cv_name : str : name of the collective variable in the plumed input file. In Example 1, this would simply be 'CV'.
    • -
    • cv_start : float : value of the collective variable from which the moving restraint should start. -This needs to be consistent with the starting geometry of each of the walkers.
    • -
    • cv_stop : float : final value of the collective variable towards which the moving restraint will move.
    • -
    • cv_delta : float : collective variable interval which will be learned in each iteration. For example, if the start -value is 0, the stop value is 1, and the delta is 0.1. Then each of the walkers will do the following:
        -
      • iteration 0: the moving restraint will force the walkers to move from 0 to 0.1. Most of the sampled geometries will -be centered around 0.1.
      • -
      • iteration 1: walkers that were reset in the previous iteration will repeat the same sampling (i.e. their bias will move from 0 to 0.1). -walkers which were not reset will now experience a harmonic restraint from 0.1 to 0.2.
      • -
      • et cetera
      • -
      -
    • -
    -

    Example 2: solid-state phase transitions

    -

    As an example of incremental learning with moving restraints, we consider a system from the original -psiflow paper as published in npj Computational Materials.

    -
    -

    Image title -

    -
    Two topical metal-organic frameworks. This examples uses metadynamics -to force a transition between the closed and open pore phases of MIL-53(Al). -
    -
    -
    -import statements and helper functions -

    import requests
    -import logging
    -from pathlib import Path
    -import numpy as np
    -
    -from ase.io import read
    -
    -import psiflow
    -from psiflow.learning import IncrementalLearning, load_learning
    -from psiflow.models import MACEModel, MACEConfig
    -from psiflow.reference import CP2KReference
    -from psiflow.data import FlowAtoms, Dataset
    -from psiflow.walkers import BiasedDynamicWalker, PlumedBias
    -from psiflow.state import load_state
    -from psiflow.metrics import Metrics
    -
    -
    -def get_bias():
    -    """Defines the metadynamics parameters based on a plumed input script"""
    -    plumed_input = """
    -UNITS LENGTH=A ENERGY=kj/mol TIME=fs
    -CV: VOLUME
    -MOVINGRESTRAINT ARG=CV STEP0=0 AT0=5250 KAPPA0=0.1 STEP1=5000 AT1=5000 KAPPA1=0.1
    -"""
    -    return PlumedBias(plumed_input)
    -
    -When using incremental learning, the bias potential needs to be a MOVINGRESTRAINT kind -of potential. While its centers will automatically be incremented by the algorithm from -initial to final CV value, it's important to set STEP0=0 and STEP1=nsteps, where nsteps -is the number of steps that the walker will make in each propagation (i.e. walker.steps).

    -
    -

    The main function looks more or less the same as the one from the previous example, -the only difference being the learning class and the contents of the PLUMED input file.

    -

    def main(path_output):
    -    assert not path_output.exists()
    -    reference = get_reference()
    -
    -    atoms = FlowAtoms.from_atoms(read(Path.cwd() / 'data' / 'mof.xyz'))
    -    atoms.canonical_orientation()   # transform into conventional lower-triangular box
    -
    -    config = MACEConfig()
    -    config.r_max = 6.0
    -    config.num_channels = 32
    -    config.max_L = 1
    -    model = MACEModel(config)
    -
    -    model.add_atomic_energy('H', reference.compute_atomic_energy('H', box_size=6))
    -    model.add_atomic_energy('O', reference.compute_atomic_energy('O', box_size=6))
    -    model.add_atomic_energy('C', reference.compute_atomic_energy('C', box_size=6))
    -    model.add_atomic_energy('Al', reference.compute_atomic_energy('Al', box_size=6))
    -
    -    # set learning parameters
    -    learning = IncrementalLearning(
    -            path_output=path_output,
    -            niterations=10,
    -            train_valid_split=0.9,
    -            train_from_scratch=True,
    -            metrics=Metrics('MOF_phase_transition', 'psiflow_examples'),
    -            error_thresholds_for_reset=(10, 200), # in meV/atom, meV/angstrom
    -            cv_name='CV',
    -            cv_start=5250,
    -            cv_stop=3000,
    -            cv_delta=-250,
    -            )
    -
    -    bias    = get_bias()
    -    walkers = BiasedDynamicWalker.multiply(
    -            50,
    -            data_start=Dataset([atoms]),
    -            bias=bias,
    -            timestep=0.5,
    -            steps=5000,
    -            step=50,
    -            start=0,
    -            temperature=300,
    -            temperature_threshold=3, # reset if T > T_0 + 3 * sigma
    -            pressure=0,
    -            )
    -    data = learning.run(
    -            model=model,
    -            reference=reference,
    -            walkers=walkers,
    -            )
    -
    -All walkers start from the initial state as stored in the mof.xyz file in -examples/data. -This state has a unit cell volume of about 5250 cubic angstrom, and therefore we -initialize the incremental learning run using cv_start=5250. This corresponds -to the open pore in the figure. The closed pore state (which is the final state -of the transition) corresponds to about 3000 cubic angstrom, i.e. cv_stop=3000.

    -

    Committee Learning

    -

    Ensemble-based active learning algorithms are quite commonly employed -in the development of machine-learned interaction potentials. -A Committee of models with different initializations is trained on the same -data, and its disagreement on sampled geometries is used to create a subset of -geometries for which the models' predictions are most divergent (and hence, -whose inclusion in the dataset would yield the largest benefit). -In psiflow, these mechanics are implemented using a Committee, which -wraps around a list of models and takes care of training and disagreement evaluation, -and the CommitteeLearning subclass, which inherits from SequentialLearning and -adds an additional keyword argument:

    +

    Psiflow allows for the seamless development and scalable +execution of online learning algorithms for ML potentials. +The Learning class provides an interface based on which such +algorithms can be implemented. +They keep track of the generated data, error metrics, optional Weights & +Biases logging, and provide basic restart functionalities in case +something goes wrong. +Learning objects are instantiated using the following arguments:

      -
    • nstates_per_iteration: int : number of sampled states that will be selected -by the committee for QM evaluation. This should always be significantly smaller -than the number of walkers for committee learning to yield any benefit.
    • +
    • reference (type Reference): the Reference instance which will be used to + evaluate ground-truth energy/force labels for each of the samples generated.
    • +
    • path_output (type str | Path): the location to a folder in which intermediate + models, datasets, walker states, and restart files can be saved.
    • +
    • train_valid_split (type float): fraction of generated data which should be used + for the training set (as opposed to validation).
    • +
    • error_thresholds_for_reset (type list[Optional[float]]): during online learning, + it is not uncommon to have walkers explore unphysical regions in phase space due to + irregularities in the intermediate potential, excessive temperatures/pressures, ... + In those cases, it is beneficial to reset walkers to their starting configurations, of + which it is known to be a physically sound starting point. The decision to reset walkers + is made every time the 'exact' energy and forces have been computed from a sampled + state. If the error between the corresponding walker's model (i.e. the previous model) + and the QM-evaluated energy and forces exceeds a certain threshold (both on energies and + forces), the walker is reset. + This argument expects a list of length two (threshold on energy error, and threshold on + force error), with optional None values if no reset is desired. + For example: [None, 0.1] indicates to reset whenever the force RMSE exceeds 100 meV/A, + and ignore any energy discrepancy.
    • +
    • error_thresholds_for_discard (type list[Optional[float]]): states which are + entirely unphysical do not contribute to the accuracy of the model, and sometimes even + hinder proper training. If these error thresholds are exceeded, the state is discarded and the walker is reset.
    • +
    • wandb_group (type str): if specified, the computed dataset metrics will be logged + to Weights & Biases in the corresponding group of runs for easy visual analysis.
    • +
    • wandb_project (type str): if specified, the computed dataset metrics will be logged + to Weights & Biases in the corresponding project for easy visual analysis.
    • +
    • initial_data (type Dataset): existing, labeled data from which the learning can be + bootstrapped. Note that all states in this dataset must be labeled, and that this is + only sensible if the labeling agrees with the given Reference instance. (Same level of + theory, same basis set, grid settings, ... ).
    -

    It differs from sequential learning in the sense that each training step actually -trains all members of the committee (typically 2 - 8), all of which are initialized -with different seeds (but are otherwise identical). -Retraining per iteration is now a strict requirement as it will greatly increase -the quality of the obtained disagreements. -Walkers are propagated in phase space using the first member of the committee.

    -

    Example 3: ion migration

    -

    In the following example, we first perform two iterations of regular, sequential -learning, but then proceed by creating a committee of four members and doing -committee-based learning for another four iterations.

    -

    def main(path_output):
    -    assert not path_output.exists()
    -    reference = get_reference()     # CP2K; PBE-D3(BJ); TZVP
    -    bias      = get_bias()          # simple MTD bias on unit cell volume
    -    atoms = FlowAtoms.from_atoms(read(Path.cwd() / 'data' / 'perovskite_defect.xyz'))
    -    atoms.canonical_orientation()   # transform into conventional lower-triangular box
    -
    -    config = MACEConfig()
    -    config.r_max = 7.0
    -    config.num_channels = 16
    -    config.max_L = 1
    -    config.batch_size = 4
    -    config.patience = 10
    -    config.energy_weight = 100
    -    model = MACEModel(config)
    -
    -    model.add_atomic_energy('H', reference.compute_atomic_energy('H', box_size=6))
    -    model.add_atomic_energy('C', reference.compute_atomic_energy('C', box_size=6))
    -    model.add_atomic_energy('N', reference.compute_atomic_energy('N', box_size=6))
    -    model.add_atomic_energy('I', reference.compute_atomic_energy('I', box_size=6))
    -    model.add_atomic_energy('Pb', reference.compute_atomic_energy('Pb', box_size=6))
    -
    -    walkers = BiasedDynamicWalker.multiply(
    -            100,
    -            data_start=Dataset([atoms]),
    -            bias=bias,
    -            timestep=0.5,
    -            steps=1000,
    -            step=50,
    -            start=0,
    -            temperature=100,
    -            temperature_threshold=3, # reset if T > T_0 + 3 * sigma
    -            pressure=0,
    -            )
    -    metrics = Metrics('perovskite_defect', 'psiflow_examples')
    -
    -    learning = SequentialLearning(
    -            path_output=path_output / 'learn_sequential',
    -            niterations=2,
    -            train_valid_split=0.9,
    -            metrics=metrics,
    -            error_thresholds_for_reset=(10, 100), # in meV/atom, meV/angstrom
    -            temperature_ramp=(200, 400, 2),
    -            )
    -    data = learning.run(
    -            model=model,
    -            reference=reference,
    -            walkers=walkers,
    -            )
    -
    -    # continue with committee learning
    -    learning = CommitteeLearning(
    -            path_output=path_output / 'learn_committee',
    -            niterations=5,
    -            train_valid_split=0.9,
    -            metrics=metrics,
    -            error_thresholds_for_reset=(10, 100), # in meV/atom, meV/angstrom
    -            temperature_ramp=(600, 1200, 3),
    -            nstates_per_iteration=50,
    -            )
    -    model.reset()
    -    committee = Committee([model.copy() for i in range(4)])
    -    data = learning.run(
    -            committee=committee,
    -            reference=reference,
    -            walkers=walkers,
    -            initial_data=data,
    -            )
    -
    -The full example is available here.

    diff --git a/models/index.html b/models/index.html index a36d873..7d0b7d5 100644 --- a/models/index.html +++ b/models/index.html @@ -330,6 +330,26 @@ +
  • + + + + + free energy calculations + + + + +
  • + + + + + + + + +
  • @@ -409,7 +429,10 @@

    ML potentials

    rmse = compute_rmse(forces_pred, forces_target) # this is a Future! print('forces RMSE: {} eV/A'.format(rmse.result())) -
  • + +Note that model.save() will save both a .yaml file with all hyperparameters as well as the actual .pth model which is needed to reconstruct the corresponding PyTorch module (possibly outside of psiflow if needed). +As such, it expects a directory as argument (which may either already exist or will be +created).

    In many cases, it is generally recommended to provide these models with some estimate of the absolute energy of an isolated atom for the specific level of theory and basis set considered (and this for each element). Instead of having the model learn the absolute total energy of the system, we first subtract these atomic energies in order diff --git a/mofs.png b/mofs.png deleted file mode 100644 index 05296ac..0000000 Binary files a/mofs.png and /dev/null differ diff --git a/overview.png b/overview.png new file mode 100644 index 0000000..78eb5e0 Binary files /dev/null and b/overview.png differ diff --git a/parslfest_thumbnail.png b/parslfest_thumbnail.png deleted file mode 100644 index 2b38d27..0000000 Binary files a/parslfest_thumbnail.png and /dev/null differ diff --git a/reference/index.html b/reference/index.html index e0824fd..e622f36 100644 --- a/reference/index.html +++ b/reference/index.html @@ -394,6 +394,26 @@ +

  • + + + + + free energy calculations + + + + +
  • + + + + + + + + +
  • diff --git a/sampling/index.html b/sampling/index.html index 67823d4..d9e5907 100644 --- a/sampling/index.html +++ b/sampling/index.html @@ -441,6 +441,26 @@ +
  • + + + + + free energy calculations + + + + +
  • + + + + + + + + +
  • @@ -495,13 +515,13 @@

    the Walker class

    during simulations), the temperature and pressure, or a timestep.

    from psiflow.sampling import Walker
     from psiflow.geometry import Geometry
    -from psiflow.hamiltonians import get_mace_mp0
    +from psiflow.hamiltonians import MACEHamiltonian
     
     
     start = Geometry.load("start.xyz")
     walker = Walker(
         start,
    -    hamiltonian=get_mace_mp0(),
    +    hamiltonian=MACEHamiltonian.mace_mp0(),
         temperature=300.0,
         pressure=None,  # NVT simulation
         timestep=0.5,   # in femtoseconds, the default value
    diff --git a/sitemap.xml.gz b/sitemap.xml.gz
    index 4cf972e..ccf1f47 100644
    Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ