Skip to content

A repo of the most seminal applications of geometric deep learning in molecular tasks.

Notifications You must be signed in to change notification settings

soulios/ToxicGeometricDL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 

Repository files navigation

ToxicGeometricDL

A repo of the most seminal applications of geometric deep learning in toxicity prediction tasks.

The most comprehensive professionally curated resource on Geometric Deep learning applied in toox pred tasks including the best tutorials, videos, books, papers, articles, courses, websites, conferences and open-source libraries. Since, predictive toxicology is a niche field, many papers are from neighboring fields such as drug discovery or plain DL which are just applied in toxicology datasets.

The papers will be listed by time order, noting the advancements along the way.

Disclaimer: All the images are sourced from the resources I linked.

Table of Contents

Papers

Paper Author Year Github Comments Datasets
General interest
Gated Graph Sequence Neural Networks Li et al. 2015 Github LSTM on graphs Graph algorithm tasks
Semi-Supervised Classification with Graph Convolutional Networks Kipf and Welling 2016 Github The most influential GCN🔥🔥🔥 Citation datasets
Graph Attention Networks Velickovic et al. 2017 Github Introduced attention to GNNs. Not implemented inductively though.🔥🔥🔥 Citation and PPI
Inductive Representation Learning on Large Graphs Hamilton et al. 2017 Github The inductive variant of Kipf's GCN along with different aggregator functions Citation and PPI
Geom-GCN: Geometric Graph Convolutional Networks Pei et al. 2020 Github Transductive model including geometric info Citation networks
Pooling?
Hierarchical Graph Representation Learning with Differentiable Pooling Ying 2018 Github Pooling hierarchical better than global mean and sum and sortpooling ENZYMES,PROTEINS,REDDIT.COLAB
Molecular property/activity/toxicity prediction
Convolutional Networks on Graphs for Learning Molecular Fingerprints Duvenaud et al. 2015 Github Aligned the notion of graph embedding to molecular fingerprints Solubility, Drug efficacy
Molecular Graph Convolutions: Moving Beyond Fingerprints Kearnes et al. 2016 DeepChem Introduced edge features. Used weave convolutions and noticed that complex atom/bond featurizations do not enhance the model PCBA,MUV,Tox21
Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction Coley et al. 2017 Github Atom features in graphs like ECFP, bond features are not updating. Not an improvement over Tox21 Challenge winner Solubility, Tox21
Neural Message Passing for Quantum Chemistry Gilmer et al. 2017 Github Introduced the concept of Message passing networks. Also, tried to encode spatial info about the graph by distance bins, super nodes and virtual edges and resembled model ensembling by the concept of multiple towers🔥🔥🔥 QM9
Learning Graph-Level Representation for Drug Discovery Li et al. 2017 Github Introduced dummy super node connected to all the other nodes to learn global features Tox21(0,76 scaffold)ToxCast,HIV,PCBA,MUV
PotentialNet for Molecular Property Prediction Feinberg et al. 2018 DGL-lifesci Used another type of split called agglomerative by the pairwise similarity of every ligand-protein pair,Unknown split on tox21 PDBBind,QM8,Tox21(0.856)
Adaptive Graph Convolutional Neural Networks Li et al. 2018 Github A successful spectral graph model Tox21,CLinTox,ToxCast
Graph classification using structural attention Lee et al. 2018 Github Improved attention HIV,NCI
Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction Liu et al. 2019 None Predicting ADME properties with a multitask GCN.Surpassed a known tool by far,Cubist ADME
Analyzing Learned Molecular Representations for Property Prediction Yang et al. 2019 Chemprop The directed edges reduce the noise.They added also global features by RDKit.SOTA results on Tox21 but excellent github resource Most molecular datasets🔥🔥🔥🔥🔥🔥🔥🔥🔥
Strategies for Pre-training Graph Neural Networks Hu et al. 2019 DGL-lifesci Combination of node-wise(context pred and attr masking) and graph-wise supervised pretraining does not cause negative transfer🔥🔥🔥 ChEMBL,ZINC,Tox21,ToxCast etc
Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism Xiong et al. 2020 DeepChem AttentiveFP was a significant improvement to previous models.They added chirality to atom features and stereo to bond features.Also,they used GRU as a readout.Check for aromaticity pretraining task. Most molecular datasets
Multi-View Graph Neural Networks for Molecular Property Prediction Ma et al. 2020 None Graph seen in two ways, edge-central and node-central and a cross-dependent passing enhances more the model, Check for interpretability(Tox21 scaf 0.836🔥🔥🔥 Most molecular datasets
Communicative Representation Learning on Attributed Molecular Graphs Song et al. 2021 Github D-MPNN but with a communicative function to boost the edge messages🔥🔥🔥 Same as Gilmer DMPNN
Graph Contrastive learning
Knowledge graph-enhanced molecular contrastive learning with functional prompt Y. Fang et al. 2023 KNOWLEDGE GRAPH enhanced pretrained, SOTA but expensive KG🔥🔥🔥 Most molecular datasets(Tox21=0.837)
MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph Sun et al. 2021 Local level contrastive learning by bioisostere substitution, and global-level maximizing the similarity of ECFPs ang graph embeddings. NEGATIVE TRANSFER on TOX datasets Bace,BBBP,Tox21,ToxCast
Molecular contrastive learning of representations via graph neural networks Wang et al. 2022 Best contrastive so far, augmenting by subgraph removal Most molecular datasets(Tox21=0.8)
Pretraining-Transfer Learning
Geometry-enhanced molecular representation learning for property prediction Fang et al. 2022 Github Pre-train on geometry info improves the downstream task performance🔥🔥🔥 Most molecular datasets
PRE-TRAINING MOLECULAR GRAPH REPRESENTATION WITH 3D GEOMETRY Liu et al. 2022 Contrast 2D with 3D or generate 3D from 2D, NOT IMPROVED Most molecular datasets
Multi-modal
Dual-view Molecule Pre-training Zhu et al. 2021 SMILES transformer and GNN node masking as pretraining, Dual view consistenscy loss PubChem 10M LESS THAN 0.8 ON TOX21 â–º `
Molecule Property Prediction Based on Spatial Graph Embedding Wang et al. 2018 Github 1D-convolutions on each atom's features using skip connections + fingerprints ESOL,lipophilicity,PDBBind
Data-centric AI
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction Hao et al. 2020 Github Active Learning surpassed Infograph and Mean Teaches and SelfSL but still expensive QM9,OPV
Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network Chen et al. 2021 Github Mean-Teacher mediocre results Tox21,QM9,ZINC
Low Data Drug Discovery with One-Shot Learning Altae-Tran et al. 2017 DeepChem One-shot learning in graph classification tasks paired with LSTM updates Tox21(0.827),SIDER,MUV
Graph Generation/ De novo molecule design
Junction Tree Variational Autoencoder for Molecular Graph Generation Jin et al. 2018 Github JTVAE generating molecules by first creating scaffolds ZINC
MolGAN: An implicit generative model for small molecular graphs De Cao and Kipf 2018 Github GANs and RL combined gave valid, novel but not unique molecules QM9
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation You et al. 2018 Github GCPN uses adversarial training and RL,better than JT-VAE but not compared to molgan ZINC
MoFlow: An Invertible Flow Model for Generating Molecular Graphs Zang et al. 2020 Github Best compared to any other graph generation QM9,ZINC
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation Shi et al. 2019 GithubOne of the first flow based approaches ZINC
Variational Graph Autoencoders
Variational Graph Auto-Encoders Kipf amd Welling 2016 Github The first GVAE on link prediction tasks Citation
Constrained Graph Variational Autoencoders for Molecule Design Liu et al. 2018 Github Novel unique and valid molecules🔥🔥🔥
Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders Ma et al. 2018 None Mediocre results in graph generation QM9,ZINC
Graph Unet
Graph U-Nets Gao and Li 2018 none com
Graph Transformers
Self-Supervised Graph Transformer on Large-ScaleMolecular Data Rong et al. 2020 Github Dynamic MPNN:Number of hops is random ChEMBL,ZINC
Graph Transformer Networks Jun et al. 2020 An improvement compared to GAT to learn better node-level representations IMDB and Citations
Graph Explainability
GNNExplainer: Generating Explanations for Graph Neural Networks Ying et al. 2019 A model-agnostic, single-instance,post-hoc explanation by extracting subgraphs MUTAG,REDDIT
Reinforced Causal Explainer for Graph Neural Networks Wang et al. 2022 It frames the explanation task as a sequential decision process. MUTAG,REDDIT,Genome
Reviews
Graph convolutional networks: a comprehensive review Zhang et al. 2019 none com
How Powerful are Graph Neural Networks? Xu et al. 2019 none com
Does GNN Pretraining Help Molecular Representation? Sun et al. 2022 MUST READ🔥🔥🔥 ABLATION STUDIES
Graph convolutional networks for computational drug development and discovery Sun et al. 2020 none com
Graph neural networks: A review of methods and applications Zhou et al. 2020 none com
A compact review of molecular property prediction with graph neural networks Wieder et al. 2020 none 🔥🔥🔥
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models Jiang et al. 2021 none 🔥🔥

Tutorials

Articles

Repositories

Videos

- An Introduction to Graph Neural Networks: Models and Applications by Miltos ALlamanis(Microsoft Research)🔥🔥🔥

Tools

  • DGL-lifesci: DGL-LifeSci is a python package for applying graph neural networks to various tasks in chemistry and biology.🔥🔥🔥
  • Pytorch-Geometric PYG is a library to easily train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.
  • Dive Into Graphs DIG provides a unified testbed for higher level, research-oriented graph deep learning tasks, such as graph generation, self-supervised learning, explainability, and 3D graphs.
  • RDKit: A cheminformatics library for generating/calculating molecular descriptors and fingerprints and handling molecules
  • MoleculeNet: A library for benchmarking ML models across different molecular tasks
  • DeepChem: A toolkit which includes a lot of different models and datasets with relevant tutotials for gentle introduction into molecular ML.

What is toxicology?

Toxicology is the study of the adverse effects of chemicals or physical agents on living organisms.

It can be clustered by the degree of their damage to cell (cytotoxicity), organ (hepatotoxicity) or systemic toxicants (mutagenicity,genotoxicity). Any given chemical has to undergo a rigorous, expensive and time-consuming toxicity-assessment. The field of computational toxicology tries to alleviate this burden by building QSAR (quantitative structure-activity relationship) models that associate a structure to a specific toxic effect. For many years , that was a task of experienced chemists that know which fragments of a molecule are potentially toxic and build models based on these so called structural alerts. Basically, if a substructure was identified as part of a molecule, that molecule had a higher probability of being toxic. The last two decades there has been a lot of development as we developed ways to represent a molecule in a machine-readable way. Among the most popular are the SMILES strings, the molecular descriptors and the molecular fingeprints. For the computer though, toxicity is just a dataset where a chemical has a label of 1 or 0 encoding being toxic or not. The last fifteen years they have been developed several datasets with these mappings based on lab experiments. MoleculeNet is a library that gathered all these as a benchmark for the ML models. Based on these representations and coupled with ML and DL models, we achieved great results. In the last few years, graph neural networks seem to dominate the research interest of the field. As graphs are a new input for our models, it was intuitive that molecules can be described as graphs and that lit the spark for the development of this field and the reason for this repo.

What is a graph?

A graph G is a set of nodes and vertices between them G(V,E). Molecules can be intuitively seen as graphs where the nodes are the atoms and the edges are the bonds between them.

alt text

How we use the graphs as input though?

The graph can be represented essentially by three matrices:

  • The adjacency matrix, which shows how the nodes(atoms) are connected
  • The node features matrix, which encodes information about every node(atom)
  • The edge features matrix, whoch encodes information about the edge(bond)

alt text

Graph Neural Networks

GNNs are a type of networks that operate on these graphs.

There are two ways to develop GNNs, spectrally and spatially.

Both tried to generalize the mathematical concept of convolution to graphs. The spectral methods stuck to the strict mathematical notions resorted to the frequency domain(Laplacian eigenvectors). Being computationally expensive and not applicable to inductive scenarios, they finally died out. Spatial ones form the ones, now known as graph convolutions and are the ones that we are going to analyse more. If you still want to get a basic understanding of spectral methods you can advise the links below.

Oops, I mentioned inductive without even explaining. The image speaks for itself.

Inductive learning: This type of learning is like the usual supervised learning. The model has not seen the nodes/graphs that will later classify. This applies to graph-classification tasks which are our main interst for molecular proprerty prediction.

Transductive learning: In transductive learning, the model has seen the nodes without their labels and/or some features but gets an understanding of how they are connected within the graph. That is useful mainly for node-classification tasks.

alt text

Graph Convolutions

Normal Convolutions

A typical feed forward network does a forward pass with the following equation:

  • Y = σ(W*X + β),

where σ is a non-linear function(ReLU,tanh), W is the weight associated with each feature, X the features and β is the bias.

In convolutional neural networks, the input usually is an image(i.e a tensor height* width*channels).

  • An RGB image has three channels whereas a greyscale only one.

  • In CNNs, the W is called a filter or kernel and is usually a matrix(2x2, 3x3 etc.) which is the same passed acrossed the image to extract features(patterns) from every part of the image.

  • That is called weight sharing. That is done because a pattern is interesting wherever it is in the image(translational invariance)

alt text

The question became how we can generalize the convolutions to graphs?

There are some significant differences between images and graphs.

  • Images are positioned in a Euclidean space, and thus have a notion of locality.Pixels that are close to each other will be much more strongly related than distant ones. Graphs on the other hand do not as information about the distance between nodes is not encoded.
  • Pixels follow an order while graph nodes do not. So, the locality is achieved in graphs based on neighborhoods. Also, we adopt the weight sharing from the normal convolutions.

Invariance

The order invariance is achieved by applying functions that are order invariant Permutation matrix, P is a matrix that only changes the order of another matrix. So for every P, the following equation should be obeyed.

  • f(PX)=f(X)

Equivariance

But if we wanted information on node-level the invariant function would not suffice. Instead, we need a permutation equivariant function that do not change the node order and follow the following equation.

  • f(PX)=Pf(X)

We can think of these functions f that transform the xi features of a node to a latent vector hi.

hi = f(xi)

Stacking these will result in H = f(X)

How we can use these latent vectors?

alt text

But hold on...

How we incorporate the adjacency matrix A into this equation?

A simple update rule:

Hk+1 = σ(AWHk), where A is the adjacency matrix, k is the number of itearations and we dropped β for simplicity reasons.

Hopefully the similarities with the classical equation are obvious.

Node-wise the equation is written:

hi = Σ (W*hj),

where j is every neighbor of the node i.

Let's see it in practice:

alt text

Considering this adjacency matrix, when we update the state of the node v1, we will take into account its neighbor states. That although would be wrong as we'll be entirely dropping the previous state of node v1. So, we need to make a correction to the adjacency matrix A by adding the identity matrix and creating the matrix Ã. That would add 1s across the diagonal making each node a neighbor of itself, i.e we add self-loops.

alt text

Each latent vector of a node is a sum of the vectors of its neighbors. So, if the degree of a node( degree shows to how many neighbors a node has) is really high the scale of the latent vector would be entirely different and we'll face vanishing or exploding gradients.

  • So, we should normalize based on the degree of the node. Firstly we calculate degree matrix, D by summing up row-wise the adjacency matrix, Ã.

alt text

Then, we inverse it and thus the equation takes the form.

alt text

Hk+1 = σ(ÃD-1WkHk)

WE DID IT! We now have the first equation upon we can build our different variants of graph convolutions.

This equation essentially describes a simple averaging of the neighbors' vectors. This update of the state of the vectors happens for i number of steps. On each step or a neighborhood hop you aggregate the vectors fo the neighbors. Once we have all the latent vectors for each node after k number of steps, we can use these for node classification or in our case we can aggregate them and reach a unique embedding for every graph.

GCN Variants

  • GCN

The most known variation of graph convolutions was set by Kipf & Welling in 2017. They introduced a renormalization trick which is more than just a mere average of the neighbors. They normalize by 1÷√(di * dj)

Hk+1 = σ(D-1/2ÃD-1/2WkHk)

From now on, we'll refer to it as the GCN.

  • GAT(Graph Attention Networks)

Petar Velickovic had another idea. Instead, of giving an equal weight to every neighbor that will be added explicitly, a concept called attention. So, the node-wise equation now became:

hi = σ(Σ(aijhj)) alt text

The aij comes from applying a softmax to eij = a(hi,hj) which are non-normalized coefficients across pairs of nodes

Influenced by the results of Vaswani et al. they included multi head attention mechanisms which is essentially a K number of replicates which are then concatenated or aggregated. The following figure from the paper makes it abundantly clear. alt text

Message Passing Neural Nets

The term message-passing arised in 2017 and is really intuitive way to see graph neural nets. The two main points evolve around the two functions that happen in a GNN

  • The Update function, q
  • The Aggregate function, U

From this youtube video we can sum them up by the figure.

alt text

Essentially we concatenate the vector of the node-in-focus of the previous step with the edges K and its neighbors. The resulting vector passed through an update function f and then aggregated by the function U. Finally they are passed through a non-linear function to get new updated representation.

The previously described GCN and GAT, following a similar formalism can be described in the following figures. alt text

This article includes an interactive session to play aroung with graphs and the most essential GNN variants.

About

A repo of the most seminal applications of geometric deep learning in molecular tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published