Skip to content

Commit

Permalink
more docs
Browse files Browse the repository at this point in the history
  • Loading branch information
svandenhaute committed Jul 24, 2024
1 parent 7b7b04f commit 80aab0d
Show file tree
Hide file tree
Showing 4 changed files with 123 additions and 64 deletions.
59 changes: 35 additions & 24 deletions docs/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,34 +72,40 @@ simulation engines (i-PI, GROMACS, OpenMM, to name a few) require that the start
configuration of the system is given in its canonical orientation.


To understand what is meant by 'generating data in the future', consider the following
## representing an atomic structure **from the future**
Consider the following
example: imagine that we have a trajectory of atomic geometries, and we wish to
minimize each of their potential energies and inspect the final optimized geometry for
each state in the trajectory. In pseudo-code, this would look something like this:
minimize each of their potential energies and inspect the result (for example to find the
global minimum).
In pseudo-code, this would look something like this:

!!! note "pseudocode"
```py
minima = [] # list which tracks result of optimization for each snapshot in the trajectory
for state in trajectory:
local_minimum = geometry_optimization(state)
minima.append(local_minimum)
global_minimum(*minima)
```

```
for state in trajectory:
final = geometry_optimization(state)
detect_minimum(final)
```
In "normal", _synchronous_ execution, when entering the first iteration of the loop, Python would
start executing the first geometry optimization right away and *wait* for it complete, before
moving on to the next iteration. This makes little sense, since we know in advance
that each of the optimizations is in fact independent. As such, the loop can in fact be completed
much quicker if we would simply execute each optimization in parallel (as opposed to
serial).
that each of the optimizations is independent. As such, the loop can in fact be completed
much quicker if we would simply execute each optimization in parallel.
This is referred to as _asynchronous_ execution because the execution of any optimization
will now be performed independent from another.
The intended way to achieve this in Python is by using the built-in
[_concurrent_](https://docs.python.org/3/library/concurrent.futures.html) library,
and it provides the foundation of psiflow's efficiency and scalability.
Aside from _asynchronous_ execution, we also want _remote_ execution.
Geometry optimizations, molecular dynamics simulations, ML training, quantum chemistry
calculations, ... are rarely ever performed on a local laptop.
Instead, they should ideally be forwarded towards e.g. a SLURM/PBS(Pro) scheduler or an
calculations, ... are rarely ever performed on a local workstation.
Instead, they should ideally be forwarded to e.g. a SLURM/PBS(Pro)/SGE scheduler or an
AWS/GCP instance.
To achieve this, psiflow is built with
[Parsl](https://github.com/parsl/parsl): a DoE-funded Python package which
extends the native _concurrent_ library with
extends the native `python.concurrent` library with
the ability to offload execution towards remote compute resources.

Parsl (and `python.concurrent`) is founded on two concepts: apps and futures. In their simplest
Expand All @@ -126,6 +132,8 @@ print(sum_future.result()) # now compute the actual result; this will print
```
The return value of Parsl apps is not the actual result (in this case, an integer), but
an AppFuture that will store the result of the function evaluation after it has completed.
By explicitly calling `.result()` on the future, we block the main code execution
until the sum is actually computed, and literally ask for the result.
For more information, check out the [Parsl documentation](https://parsl.readthedocs.io/en/stable/).

In our geometry optimization example from before, we would implement the function
Expand All @@ -135,17 +143,20 @@ Importantly, when organized in this way, Python will go through the loop almost
instantaneously, observe that we want to perform a number of `geometry_optimization`
calculations, and offload those calculations separately, independently, and immediately to whatever compute resource
it has available. As such, all optimizations will effectively run in parallel:
```
for state in trajectory:
final_future = geometry_optimization_app(state) # completes instantaneously!
detect_minimum_app(final_future) # completes instantaneously!
type(final_future) # AppFuture

# all geometry optimizations are running
!!! note "pseudocode"
```py
minima = [] # list which tracks **futures** of the optimizations
for state in trajectory:
local_minimum = geometry_optimization(state) # not an actual geometry, but a *future*
minima.append(local_minimum)

print(final_future.result()) # print the obtained `Geometry` from the last loop
global_minimum = find_lowest(*minima) # not an actual geometry, but a *future*

```
# all optimizations are running at this point in the code.

global_minimum.result() # will wait until all optimizations are actually completed
```


## representing multiple structures
Expand Down Expand Up @@ -206,5 +217,5 @@ print(energies.result().shape) # energies is an AppFuture, so we

forces = train.get('forces') # get the forces of all geometries in the training set
print(forces.result().shape) # forces is an AppFuture, so we need to call .result()
# (n, 3)
# (n, natoms, 3)
```
114 changes: 83 additions & 31 deletions docs/hamiltonian.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,114 @@
In Born-Oppenheimer-based molecular simulation, atomic nuclei are treated as classical particles that are subject to *effective* interactions, which are determined by the quantum mechanical behavior of the electrons.
These interactions determine the interatomic forces which are used in a dynamic simulation to propagate the atomic positions from one timestep to the next.
In more advanced schemes, researchers may modify these effective interactions to include biasing forces (e.g. in order to induce a phase transition), or perform an alchemical transformation between two potential energy surfaces (when computing relative free energies).

The ability to combine various energy contributions in an arbitrary manner allows for a very natural definition of many algorithms in computational statistical physics.
To accomodate for all these use cases, psiflow provides a simple abstraction for *"a function which accepts an atomic geometry and returns energies and forces"*: the `Hamiltonian` class.
Examples of Hamiltonians are a specific ML potential, a bias potential on a collective variable, or a quadratic approximation to a potential energy minimum.

By far the simplest hamiltonian is the Einstein crystal, which binds atoms to a certain reference position using harmonic springs with a single, fixed force constant.
In Born-Oppenheimer-based molecular simulation, atomic nuclei are treated as classical
particles that are subject to *effective* interactions -- these are the result of the quantum
mechanical behavior of the electrons. These interactions determine the interatomic forces
which are used in a dynamic simulation to propagate the atomic positions from one timestep
to the next.
Traditionally, dynamic simulations required an explicit evaluation of these effective
forces in terms of a quantum mechanical calculation (e.g. DFT(B)).
Recently, it became clear that it is much more efficient to perform such simulations
using a machine-learned representation of the interaction energy, i.e. an ML potential.
The development and application of ML potentials throughout large simulation workflows is in
fact one of the core applications of psiflow.

The `Hamiltonian` class is used to represent any type of interaction potential.
Examples are pre-trained, 'universal' models (e.g. [MACE-MP0](https://arxiv.org/abs/2401.00096)),
ML potentials trained within psiflow (see [ML potentials](model.md)), or a quadratic
(hessian-based) approximation to a local energy minimum, to name a few.
In addition, various sampling schemes employ bias potentials which are superimposed on the
QM-based Born-Oppenheimer surface in order to drive the system
along specific reaction coordinates (e.g. metadynamics, umbrella sampling).
Such bias potentials are also instances of a `Hamiltonian`.

By far the simplest hamiltonian is the Einstein crystal, which binds atoms to a certain
reference position using harmonic springs with a single, fixed force constant.

```py
from psiflow.geometry import Geometry
from psiflow.hamiltonians import EinsteinCrystal


# isolated H2 molecule
geometry = Geometry.from_string('''
2
H 0.0 0.0 0.0
H 0.0 0.0 0.8
''')

einstein = EinsteinCrystal(
reference_geometry=geometry.positions, # positions at which all springs are at rest
force_constant=0.1, # force constant, in eV / A**2
)
einstein = EinsteinCrystal(geometry, force_constant=0.1) # in eV/A**2

```
As mentioned earlier, the key feature of hamiltonians is that they take as input an atomic geometry, and spit out an energy, a set of forces, and optionally also virial stress.
Because hamiltonians might require specialized resources for their evaluation (e.g. an ML potential which gets executed on a GPU), evaluation of a hamiltonian does not necessarily happen instantly (e.g. if a GPU node is not immediately available). Similar to how `Dataset` instances return futures of a `Geometry` when a particular index is queried, hamiltonians return a future when asked to evaluate the energy/forces/stress of a particular `Geometry`:
As mentioned earlier, the key feature of hamiltonians is that they represent an interaction energy between atoms,
i.e. they output an energy (and its gradients) when given a geometry as input.
Because hamiltonians might require specialized resources for their evaluation (e.g. an ML
potential which gets executed on a GPU), evaluation of a hamiltonian does not necessarily
happen instantly (e.g. if a GPU node is not immediately available). Similar to how
`Dataset` instances return futures of a `Geometry` when a particular index is queried,
hamiltonians return a future when asked to evaluate the energy/forces/stress of a
particular `Geometry`:

```py
future = einstein.evaluate(geometry) # returns an AppFuture of the Geometry; evaluates instantly
evaluated = future.result() # calling result makes us wait for it to actually complete
energy = einstein.compute(geometry, 'energy') # AppFuture of an energy (np.ndarray with shape (1,))
print(energy.result()) # wait for the result to complete, and print it (in eV)


data = Dataset.load('snapshots.xyz') # N snapshots
energy, forces, stress = einstein.compute(data) # returns energy and gradients for each snapshot in data


assert evaluated.energy is not None # the energy of the hamiltonian
assert not np.any(np.isnan(evaluated.per_atom.forces)) # a (N, 3) array with forces
assert energy.result().shape == (N,) # one energy per snapshot
assert forces.result().shape == (N, max_natoms, 3) # forces for each snapshot, with padded natoms
assert stress.result().shape == (N, 3, 3) # stress; filled with NaNs if not applicable
```
One of the most commonly used hamiltonians will be that of MACE, one of the most ubiquitous ML potentials.
There exist reasonably accurate pretrained models which can be used for exploratory purposes.
An particularly important hamiltonian is MACE, one of the most ubiquitous ML potentials.
These are readily available in psiflow:

```py
from psiflow.hamiltonians import get_mace_mp0
from psiflow.hamiltonians import MACEHamiltonian


mace = get_mace_mp0() # downloads MACE-MP0 from github
future = mace.evaluate(geometry) # evaluates the MACE potential on the geometry
mace = MACEHamiltonian.mace_mp0() # downloads MACE-MP0 from github
forces = mace.compute(geometry, 'forces') # evaluates the MACE potential on the geometry

forces = forces.result() # wait for evaluation to complete and get actual value

assert np.sum(np.dot(forces[0], forces[1])) < 0 # forces in H2 always point opposite of each other

assert np.allclose(np.sum(forces, axis=0), 0.0) # forces are conservative --> sum to [0, 0, 0]
```
A unique feature of psiflow `Hamiltonian` instances is the ability to create a new
hamiltonian from a linear combination of two or more existing hamiltonians.
Let us consider the particular example of [umbrella
sampling](https://wires.onlinelibrary.wiley.com/doi/10.1002/wcms.66) using
[PLUMED](https://www.plumed.org/):

```py
from psiflow.hamiltonians import PlumedHamiltonian

evaluated = future.result()
forces = evaluated.per_atom.forces # forces on each atom, in float32
plumed_str = ""
bias = PlumedHamiltonian()

assert np.sum(np.dot(forces[0], forces[1])) < 0 # forces in H2 always point opposite of each other
assert np.allclose(np.sum(forces), 0.0) # forces are conservative --> sum to zero
```
As alluded to earlier, hamiltonians can be combined in arbitrary ways to create new hamiltonians.
Psiflow supports a concise syntax for basic arithmetic operations on hamiltonians, such as
multiplication by a scalar or addition of two hamiltonians:

$$
H = \alpha H_0 + (1 - \alpha) H_1
$$

is very straightforward to express in code:


This allows for a very natural definition of many algorithms in computational statistical physics
(e.g. Hamiltonian replica exchange, thermodynamic integration, biased dynamics).

```py
from psiflow.hamiltonians import PlumedHamiltonian


plumed_input = """
"""
bias =


data = Dataset.load('train.xyz')
mix = 0.5 * einstein + 0.5 * mace # MixtureHamiltonian with E = 0.5 * E_einstein + 0.5 * E_mace
energies_mix = mix.evaluate(data).get('energy')
Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ hide:
Psiflow is a scalable molecular simulation engine for chemistry and materials science applications.
It supports:

- **quantum mechanical calculations** at various levels of theory (GGA and hybrid DFT, post-HF methods such as MP2 or RPA, and even coupled cluster; using CP2K|GPAW|ORCA)
- **quantum mechanical calculations** at various levels of theory (GGA and hybrid DFT, post-HF methods such as MP2 or RPA, and even coupled cluster; using CP2K | GPAW | ORCA)

- **trainable interaction potentials** as well as easy-to-use universal potentials, e.g. [MACE-MP0](https://arxiv.org/abs/2401.00096)
- a wide range of **sampling algorithms**: NVE|NVT|NPT, path-integral molecular dynamics, alchemical replica exchange, metadynamics, phonon-based sampling, ... (thanks to [i-PI](https://ipi-code.org/))
- a wide range of **sampling algorithms**: NVE | NVT | NPT, path-integral molecular dynamics, alchemical replica exchange, metadynamics, phonon-based sampling, ... (thanks to [i-PI](https://ipi-code.org/))

Users may define arbitrarily complex workflows and execute them **automatically** on local, HPC, and/or cloud infrastructure.
To achieve this, psiflow is built using [Parsl](https://parsl-project.org/): a parallel execution library which manages job submission and workload distribution.
Expand Down
10 changes: 3 additions & 7 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,7 @@ markdown_extensions:
# - javascripts/mathjax.js
# - https://polyfill.io/v3/polyfill.min.js?features=es6
# - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js

#
extra_javascript:
- javascripts/katex.js
- https://unpkg.com/katex@0/dist/katex.min.js
- https://unpkg.com/katex@0/dist/contrib/auto-render.min.js

extra_css:
- https://unpkg.com/katex@0/dist/katex.min.css
- javascripts/mathjax.js
- https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js

0 comments on commit 80aab0d

Please sign in to comment.