Skip to content

Hierarchical graph variational autoencoders for molecular representation learning

Notifications You must be signed in to change notification settings

noncomputable/hierarchical-graph-net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hierarchical Graph Autoencoders

About

This repository contains an implementation of a hierarchical graph autoencoder for molecular representation learning in PyTorch and Deep Graph Library. I have written a detailed tutorial and explanation on the architecture and design decisions which can be found here: https://noncomputable.github.io/molecules

Requirements

python==3.9
rdkit==2021.09.2
dgl==.8
PyTorch==1.10.0
networkx==2.6.3

Structure

The high-level structure of this repository is as follows:

  • core - Contains all models, scripts, and utilities
    • dataset.py - Defines a DGL Dataset for sets of molecules
    • postprocess.py - Defines functions for processing model outputs as RDKit molecules
    • preprocess.py - Defines functions for processing raw data into DGL hierarchical graphs
    • train.py - Defines functions for training and validating the models
    • models - Contains model definitions and utils
      • autoencoder.py - Defines the high-level structure of a variational autoencoder for hierarchical graphs
      • decoder.py - Defines a model that maps embeddings to hierarchical graphs
      • encoder.py - Defines a model that maps hierarchical graphs to embeddings
      • message_passing.py - Defines graph message-passing operations and models used throughout the autoencoder
      • predictors.py - Defines models for predicting node types and attachments between nodes
      • utils.py - Defines functions commonly used across models (i.e. for merging and instantiating new hierarchical graphs)
  • data - Contains both raw and processed molecule data used throughout the pipeline
    • zinc - Molecule data extracted from the ZINC compound database
      • raw - Contains lists of SMILES strings samples from ZINC
        • mols.txt - Contains a list of SMILES strings to be processed and used for training, testing, and validation
      • processed - Contains outputs of preprocessing
  • notebook.ipynb - Annotated notebook illustrating how to do dataset construction, training, and testing with this project

About

Hierarchical graph variational autoencoders for molecular representation learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published