This repository contains the code for the paper titled: Inverse mapping of quantum properties to structures for chemical space of small organic molecules
This repository provides the code to reproduce the main results from the paper. The code is organized into various scripts and notebooks. The variable reproduce_paper
is used in multiple scripts to automatically locate the data_paper
folder.
To train the model using the train.py script it usually takes around 3 hours (depending on your GPU). The notebook reproducing the main results should run in a few minutes, excluding the computation of RMSDs for the test set (depending on test set size) which can take longer.
The main packages to run scripts and notebooks are reported here, together with the version we tested on:
- ase 3.22.0
- matplotlib 3.5.0
- numpy 1.21.4
- pandarallel 1.6.4
- pandas 1.3.4
- pyarrow 7.0.0
- pytorch-lightning 1.5.10
- rmsd 1.4
- scipy 1.7.3
- torch 1.12.1
- tqdm 4.62.3
- openbabel 3.1.1
A lot of the code can be run without openbabel, dftb+ or machine learning force fields. These packages are needed though in order to add hydrogens and relax geometries. For a simple installation and use of a force field, any force field that can be used within the ase framework will do, for the one used in the work we refer to SpookyNet. For what concerns openbabel we reccomend using a conda environment.
The installation of the main packages should take a few minutes on standard hardware.
The data used for training and testing in the paper can be downloaded here (zip folder). The relevant data is located in the data_paper
directory. To use the data, place the data_paper
folder in the same directory as the notebooks and scripts. New data can be prepared using the initialize_data.py
script, which needs to be modified as required.
The model architectures are defined in the models_old.py
script (alternatively, models.py
for testing alternatives). The PyTorch Lightning model definition is provided in the Model.py
file. Pre-trained models are available in the models_saved
folder.
testing.ipynb
: Reproduces the main results of the paper.mol_gen_test.ipynb
: Demonstrates targeted molecule generation using functions frommolecular_generation_utils.py
.
Other scripts serve as utilities for various applications. For the interpolation we have here a script called interpolator.py which implements the procedure used in the paper. For the NEB part there is a notebook called NEB_interp.ipynb, please change the SpookyNet chackpoint (or force field) to what you want to use.
While the code could be better organized and structured, the current organization serves the purpose of scientifically presenting and prototyping the novel methodology outlined in the paper.
- Notebooks (
*.ipynb
): Licensed under the GNU General Public License, Version 2 (GPL-2.0). See theLICENSE_GPL
file for more details. - All Other Files: Licensed under the MIT License. See the
LICENSE
file for more details.