This repository contains the code and input files to reproduce the results of the paper "Learning Collective Variables with Synthetic Data Augmentation through Physics-inspired Geodesic Interpolation" (Yang et al., 2024).
We tested the code with Python 3.10 and the packages in requirements.txt
.
For example, you can create a conda environment and install the required packages as follows (assuming CUDA 11.8):
conda create -n geodesic-cv python=3.10
conda activate geodesic-cv
pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install -e .
To deploy the learned CV in MD simulations, you need to build the PLUMED package with the pytorch
and drr
modules, and then build the GROMACS package with the PLUMED patch.
We tested our code with PLUMED 2.9.0 with libtorch 2.0.1 and GROMACS 2023.
All the commands assume that you are in the root directory of the repository.
Your GROMACS binary might be different from gmx_mpi
, and you might have to adjust mdrun
options according to your hardware.
cd simulations/unbiased/unfolded
gmx_mpi mdrun -deffnm nvt -nsteps 25000000 -plumed plumed.dat
gmx_mpi trjconv -f nvt.xtc -pbc nojump -o trajout.xtc
cd ../folded
gmx_mpi mdrun -deffnm nvt -nsteps 25000000 -plumed plumed.dat
gmx_mpi trjconv -f nvt.xtc -pbc nojump -o trajout.xtc
python scripts/interpolate.py \
--xtc-unfolded simulations/unbiased/unfolded/trajout.xtc \
--xtc-folded simulations/unbiased/folded/trajout.xtc \
--num-interp 5000 \
--save-path simulations/interpolation
Please refer to the notebook train_cv.ipynb
for the CV model training.
We provide an example for a single run using the TDA CV. In our paper, we used all combinations of CVs and tpr files.
# Create a simulation directory
mkdir -p simulations/enhanced/TDA/nvt_0; cd simulations/enhanced/TDA/nvt_0
# Create symbolic links to the input files
ln -s ../../tpr_files/nvt_0.tpr nvt.tpr
ln -s ../../plumed_files/plumed_TDA.dat plumed.dat
ln -s ../../plumed_files/TDA.pt .
# Run the simulation
gmx_mpi mdrun -deffnm nvt -nsteps 500000000 -plumed plumed.dat
We also provide a single example for the case mentioned above.
We followed the metadynamics grid range and sigma values for each CV from Table S1 in the SI.
Note that we are printing to the COLVAR
file every 1000 steps (2 ps), so --skip-steps
of 50000 corresponds to 100 ns.
python scripts/compute_pmf.py \
--colvar-file simulations/enhanced/TDA/nvt_0/COLVAR \
--cv-thresh -8.5 8.5 \
--sigma 0.20 \
--skip-steps 50000 \
--save-path simulations/enhanced/TDA/nvt_0
This script will generate two files in the --save-path
directory:
Delta_Fs.log
contains the time (in ns) and the delta F value (in kJ/mol) at each time point.pmf.log
contains the CV grid and the PMF value (in kJ/mol) at each grid point.
- The geodesic interpolation module is taken from the original implementation.
- The CV models are implemented using the mlcolvar package.
@article{yang2024learning,
title={Learning Collective Variables with Synthetic Data Augmentation through Physics-Inspired Geodesic Interpolation},
author={Soojung Yang and Juno Nam and Johannes C. B. Dietschreit and Rafael G{\'o}mez-Bombarelli},
volume={20},
number={15},
pages={6559-6568},
year={2024},
doi={10.1021/acs.jctc.4c00435},
URL={https://doi.org/10.1021/acs.jctc.4c00435}
}