ToxicGeometricDL

A repo of the most seminal applications of geometric deep learning in toxicity prediction tasks.

The most comprehensive professionally curated resource on Geometric Deep learning applied in toox pred tasks including the best tutorials, videos, books, papers, articles, courses, websites, conferences and open-source libraries. Since, predictive toxicology is a niche field, many papers are from neighboring fields such as drug discovery or plain DL which are just applied in toxicology datasets.

The papers will be listed by time order, noting the advancements along the way.

Disclaimer: All the images are sourced from the resources I linked.

Paper	Author	Year	Github	Comments	Datasets
General interest
Gated Graph Sequence Neural Networks	Li et al.	2015	Github	LSTM on graphs	Graph algorithm tasks
Semi-Supervised Classification with Graph Convolutional Networks	Kipf and Welling	2016	Github	The most influential GCN🔥🔥🔥	Citation datasets
Graph Attention Networks	Velickovic et al.	2017	Github	Introduced attention to GNNs. Not implemented inductively though.🔥🔥🔥	Citation and PPI
Inductive Representation Learning on Large Graphs	Hamilton et al.	2017	Github	The inductive variant of Kipf's GCN along with different aggregator functions	Citation and PPI
Geom-GCN: Geometric Graph Convolutional Networks	Pei et al.	2020	Github	Transductive model including geometric info	Citation networks
Pooling?
Hierarchical Graph Representation Learning with Differentiable Pooling	Ying	2018	Github	Pooling hierarchical better than global mean and sum and sortpooling	ENZYMES,PROTEINS,REDDIT.COLAB
Molecular property/activity/toxicity prediction
Convolutional Networks on Graphs for Learning Molecular Fingerprints	Duvenaud et al.	2015	Github	Aligned the notion of graph embedding to molecular fingerprints	Solubility, Drug efficacy
Molecular Graph Convolutions: Moving Beyond Fingerprints	Kearnes et al.	2016	DeepChem	Introduced edge features. Used weave convolutions and noticed that complex atom/bond featurizations do not enhance the model	PCBA,MUV,Tox21
Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction	Coley et al.	2017	Github	Atom features in graphs like ECFP, bond features are not updating. Not an improvement over Tox21 Challenge winner	Solubility, Tox21
Neural Message Passing for Quantum Chemistry	Gilmer et al.	2017	Github	Introduced the concept of Message passing networks. Also, tried to encode spatial info about the graph by distance bins, super nodes and virtual edges and resembled model ensembling by the concept of multiple towers🔥🔥🔥	QM9
Learning Graph-Level Representation for Drug Discovery	Li et al.	2017	Github	Introduced dummy super node connected to all the other nodes to learn global features	Tox21(0,76 scaffold)ToxCast,HIV,PCBA,MUV
PotentialNet for Molecular Property Prediction	Feinberg et al.	2018	DGL-lifesci	Used another type of split called agglomerative by the pairwise similarity of every ligand-protein pair,Unknown split on tox21	PDBBind,QM8,Tox21(0.856)
Adaptive Graph Convolutional Neural Networks	Li et al.	2018	Github	A successful spectral graph model	Tox21,CLinTox,ToxCast
Graph classification using structural attention	Lee et al.	2018	Github	Improved attention	HIV,NCI
Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction	Liu et al.	2019	None	Predicting ADME properties with a multitask GCN.Surpassed a known tool by far,Cubist	ADME
Analyzing Learned Molecular Representations for Property Prediction	Yang et al.	2019	Chemprop	The directed edges reduce the noise.They added also global features by RDKit.SOTA results on Tox21 but excellent github resource	Most molecular datasets🔥🔥🔥🔥🔥🔥🔥🔥🔥
Strategies for Pre-training Graph Neural Networks	Hu et al.	2019	DGL-lifesci	Combination of node-wise(context pred and attr masking) and graph-wise supervised pretraining does not cause negative transfer🔥🔥🔥	ChEMBL,ZINC,Tox21,ToxCast etc
Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism	Xiong et al.	2020	DeepChem	AttentiveFP was a significant improvement to previous models.They added chirality to atom features and stereo to bond features.Also,they used GRU as a readout.Check for aromaticity pretraining task.	Most molecular datasets
Multi-View Graph Neural Networks for Molecular Property Prediction	Ma et al.	2020	None	Graph seen in two ways, edge-central and node-central and a cross-dependent passing enhances more the model, Check for interpretability(Tox21 scaf 0.836🔥🔥🔥	Most molecular datasets
Communicative Representation Learning on Attributed Molecular Graphs	Song et al.	2021	Github	D-MPNN but with a communicative function to boost the edge messages🔥🔥🔥	Same as Gilmer DMPNN
Graph Contrastive learning
Knowledge graph-enhanced molecular contrastive learning with functional prompt	Y. Fang et al.	2023	KNOWLEDGE GRAPH enhanced pretrained, SOTA but expensive KG🔥🔥🔥	Most molecular datasets(Tox21=0.837)
MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph	Sun et al.	2021	Local level contrastive learning by bioisostere substitution, and global-level maximizing the similarity of ECFPs ang graph embeddings. NEGATIVE TRANSFER on TOX datasets	Bace,BBBP,Tox21,ToxCast
Molecular contrastive learning of representations via graph neural networks	Wang et al.	2022	Best contrastive so far, augmenting by subgraph removal	Most molecular datasets(Tox21=0.8)
Pretraining-Transfer Learning
Geometry-enhanced molecular representation learning for property prediction	Fang et al.	2022	Github	Pre-train on geometry info improves the downstream task performance🔥🔥🔥	Most molecular datasets
PRE-TRAINING MOLECULAR GRAPH REPRESENTATION WITH 3D GEOMETRY	Liu et al.	2022	Contrast 2D with 3D or generate 3D from 2D, NOT IMPROVED	Most molecular datasets
Multi-modal
Dual-view Molecule Pre-training	Zhu et al.	2021	SMILES transformer and GNN node masking as pretraining, Dual view consistenscy loss	PubChem 10M	LESS THAN 0.8 ON TOX21 ► `
Molecule Property Prediction Based on Spatial Graph Embedding	Wang et al.	2018	Github	1D-convolutions on each atom's features using skip connections + fingerprints	ESOL,lipophilicity,PDBBind
Data-centric AI
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction	Hao et al.	2020	Github	Active Learning surpassed Infograph and Mean Teaches and SelfSL but still expensive	QM9,OPV
Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network	Chen et al.	2021	Github	Mean-Teacher mediocre results	Tox21,QM9,ZINC
Low Data Drug Discovery with One-Shot Learning	Altae-Tran et al.	2017	DeepChem	One-shot learning in graph classification tasks paired with LSTM updates	Tox21(0.827),SIDER,MUV
Graph Generation/ De novo molecule design
Junction Tree Variational Autoencoder for Molecular Graph Generation	Jin et al.	2018	Github	JTVAE generating molecules by first creating scaffolds	ZINC
MolGAN: An implicit generative model for small molecular graphs	De Cao and Kipf	2018	Github	GANs and RL combined gave valid, novel but not unique molecules	QM9
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation	You et al.	2018	Github	GCPN uses adversarial training and RL,better than JT-VAE but not compared to molgan	ZINC
MoFlow: An Invertible Flow Model for Generating Molecular Graphs	Zang et al.	2020	Github	Best compared to any other graph generation	QM9,ZINC
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation	Shi et al.	2019	GithubOne of the first flow based approaches	ZINC
Variational Graph Autoencoders
Variational Graph Auto-Encoders	Kipf amd Welling	2016	Github	The first GVAE on link prediction tasks	Citation
Constrained Graph Variational Autoencoders for Molecule Design	Liu et al.	2018	Github	Novel unique and valid molecules🔥🔥🔥
Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders	Ma et al.	2018	None	Mediocre results in graph generation	QM9,ZINC
Graph Unet
Graph U-Nets	Gao and Li	2018	none	com
Graph Transformers
Self-Supervised Graph Transformer on Large-ScaleMolecular Data	Rong et al.	2020	Github	Dynamic MPNN:Number of hops is random	ChEMBL,ZINC
Graph Transformer Networks	Jun et al.	2020	An improvement compared to GAT to learn better node-level representations	IMDB and Citations
Graph Explainability
GNNExplainer: Generating Explanations for Graph Neural Networks	Ying et al.	2019	A model-agnostic, single-instance,post-hoc explanation by extracting subgraphs	MUTAG,REDDIT
Reinforced Causal Explainer for Graph Neural Networks	Wang et al.	2022	It frames the explanation task as a sequential decision process.	MUTAG,REDDIT,Genome
Reviews
Graph convolutional networks: a comprehensive review	Zhang et al.	2019	none	com
How Powerful are Graph Neural Networks?	Xu et al.	2019	none	com
Does GNN Pretraining Help Molecular Representation?	Sun et al.	2022	MUST READ🔥🔥🔥	ABLATION STUDIES
Graph convolutional networks for computational drug development and discovery	Sun et al.	2020	none	com
Graph neural networks: A review of methods and applications	Zhou et al.	2020	none	com
A compact review of molecular property prediction with graph neural networks	Wieder et al.	2020	none	🔥🔥🔥
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models	Jiang et al.	2021	none	🔥🔥

Tutorials

Articles

Repositories

Chemprop: A library with MPNN and D-MPNN applications on molecular datasets🔥

Videos

- An Introduction to Graph Neural Networks: Models and Applications by Miltos ALlamanis(Microsoft Research)🔥🔥🔥

Intro to graph neural networks (ML Tech Talks) by Petar Velickovic(DeepMind)🔥🔥🔥
Understanding Graph Neural Networks by DeepFindr
The AI EPiphany by Gordic Aleksa

Tools

DGL-lifesci: DGL-LifeSci is a python package for applying graph neural networks to various tasks in chemistry and biology.🔥🔥🔥
Pytorch-Geometric PYG is a library to easily train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.
Dive Into Graphs DIG provides a unified testbed for higher level, research-oriented graph deep learning tasks, such as graph generation, self-supervised learning, explainability, and 3D graphs.
RDKit: A cheminformatics library for generating/calculating molecular descriptors and fingerprints and handling molecules
MoleculeNet: A library for benchmarking ML models across different molecular tasks
DeepChem: A toolkit which includes a lot of different models and datasets with relevant tutotials for gentle introduction into molecular ML.

What is toxicology?

Toxicology is the study of the adverse effects of chemicals or physical agents on living organisms.

It can be clustered by the degree of their damage to cell (cytotoxicity), organ (hepatotoxicity) or systemic toxicants (mutagenicity,genotoxicity). Any given chemical has to undergo a rigorous, expensive and time-consuming toxicity-assessment. The field of computational toxicology tries to alleviate this burden by building QSAR (quantitative structure-activity relationship) models that associate a structure to a specific toxic effect. For many years , that was a task of experienced chemists that know which fragments of a molecule are potentially toxic and build models based on these so called structural alerts. Basically, if a substructure was identified as part of a molecule, that molecule had a higher probability of being toxic. The last two decades there has been a lot of development as we developed ways to represent a molecule in a machine-readable way. Among the most popular are the SMILES strings, the molecular descriptors and the molecular fingeprints. For the computer though, toxicity is just a dataset where a chemical has a label of 1 or 0 encoding being toxic or not. The last fifteen years they have been developed several datasets with these mappings based on lab experiments. MoleculeNet is a library that gathered all these as a benchmark for the ML models. Based on these representations and coupled with ML and DL models, we achieved great results. In the last few years, graph neural networks seem to dominate the research interest of the field. As graphs are a new input for our models, it was intuitive that molecules can be described as graphs and that lit the spark for the development of this field and the reason for this repo.

What is a graph?

A graph G is a set of nodes and vertices between them G(V,E). Molecules can be intuitively seen as graphs where the nodes are the atoms and the edges are the bonds between them.

How we use the graphs as input though?

The graph can be represented essentially by three matrices:

The adjacency matrix, which shows how the nodes(atoms) are connected
The node features matrix, which encodes information about every node(atom)
The edge features matrix, whoch encodes information about the edge(bond)

Graph Neural Networks

GNNs are a type of networks that operate on these graphs.

There are two ways to develop GNNs, spectrally and spatially.

Both tried to generalize the mathematical concept of convolution to graphs. The spectral methods stuck to the strict mathematical notions resorted to the frequency domain(Laplacian eigenvectors). Being computationally expensive and not applicable to inductive scenarios, they finally died out. Spatial ones form the ones, now known as graph convolutions and are the ones that we are going to analyse more. If you still want to get a basic understanding of spectral methods you can advise the links below.

Oops, I mentioned inductive without even explaining. The image speaks for itself.

Inductive learning: This type of learning is like the usual supervised learning. The model has not seen the nodes/graphs that will later classify. This applies to graph-classification tasks which are our main interst for molecular proprerty prediction.

Transductive learning: In transductive learning, the model has seen the nodes without their labels and/or some features but gets an understanding of how they are connected within the graph. That is useful mainly for node-classification tasks.

Graph Convolutions

Normal Convolutions

A typical feed forward network does a forward pass with the following equation:

Y = σ(W*X + β),

where σ is a non-linear function(ReLU,tanh), W is the weight associated with each feature, X the features and β is the bias.

In convolutional neural networks, the input usually is an image(i.e a tensor height* width*channels).

An RGB image has three channels whereas a greyscale only one.
In CNNs, the W is called a filter or kernel and is usually a matrix(2x2, 3x3 etc.) which is the same passed acrossed the image to extract features(patterns) from every part of the image.
That is called weight sharing. That is done because a pattern is interesting wherever it is in the image(translational invariance)

The question became how we can generalize the convolutions to graphs?

There are some significant differences between images and graphs.

Images are positioned in a Euclidean space, and thus have a notion of locality.Pixels that are close to each other will be much more strongly related than distant ones. Graphs on the other hand do not as information about the distance between nodes is not encoded.
Pixels follow an order while graph nodes do not. So, the locality is achieved in graphs based on neighborhoods. Also, we adopt the weight sharing from the normal convolutions.

Invariance

The order invariance is achieved by applying functions that are order invariant Permutation matrix, P is a matrix that only changes the order of another matrix. So for every P, the following equation should be obeyed.

f(PX)=f(X)

Equivariance

But if we wanted information on node-level the invariant function would not suffice. Instead, we need a permutation equivariant function that do not change the node order and follow the following equation.

f(PX)=Pf(X)

We can think of these functions f that transform the x_i features of a node to a latent vector h_i.

h_i = f(x_i)

Stacking these will result in H = f(X)

How we can use these latent vectors?

But hold on...

How we incorporate the adjacency matrix A into this equation?

A simple update rule:

H_k+1 = σ(AWH_k), where A is the adjacency matrix, k is the number of itearations and we dropped β for simplicity reasons.

Hopefully the similarities with the classical equation are obvious.

Node-wise the equation is written:

h_i = Σ (W*h_j),

where j is every neighbor of the node i.

Let's see it in practice:

Considering this adjacency matrix, when we update the state of the node v₁, we will take into account its neighbor states. That although would be wrong as we'll be entirely dropping the previous state of node v₁. So, we need to make a correction to the adjacency matrix A by adding the identity matrix and creating the matrix Ã. That would add 1s across the diagonal making each node a neighbor of itself, i.e we add self-loops.

Each latent vector of a node is a sum of the vectors of its neighbors. So, if the degree of a node( degree shows to how many neighbors a node has) is really high the scale of the latent vector would be entirely different and we'll face vanishing or exploding gradients.

So, we should normalize based on the degree of the node. Firstly we calculate degree matrix, D by summing up row-wise the adjacency matrix, Ã.

Then, we inverse it and thus the equation takes the form.

H_k+1 = σ(ÃD^-1W_kH_k)

WE DID IT! We now have the first equation upon we can build our different variants of graph convolutions.

This equation essentially describes a simple averaging of the neighbors' vectors. This update of the state of the vectors happens for i number of steps. On each step or a neighborhood hop you aggregate the vectors fo the neighbors. Once we have all the latent vectors for each node after k number of steps, we can use these for node classification or in our case we can aggregate them and reach a unique embedding for every graph.

GCN Variants

GCN

The most known variation of graph convolutions was set by Kipf & Welling in 2017. They introduced a renormalization trick which is more than just a mere average of the neighbors. They normalize by 1÷√(d_i * d_j)

H_k+1 = σ(D^-1/2ÃD^-1/2W_kH_k)

From now on, we'll refer to it as the GCN.

GAT(Graph Attention Networks)

Petar Velickovic had another idea. Instead, of giving an equal weight to every neighbor that will be added explicitly, a concept called attention. So, the node-wise equation now became:

h_i = σ(Σ(a_ijh_j))

The a_ij comes from applying a softmax to e_ij = a(h_i,h_j) which are non-normalized coefficients across pairs of nodes

Influenced by the results of Vaswani et al. they included multi head attention mechanisms which is essentially a K number of replicates which are then concatenated or aggregated. The following figure from the paper makes it abundantly clear.

Message Passing Neural Nets

The term message-passing arised in 2017 and is really intuitive way to see graph neural nets. The two main points evolve around the two functions that happen in a GNN

The Update function, q
The Aggregate function, U

From this youtube video we can sum them up by the figure.

Essentially we concatenate the vector of the node-in-focus of the previous step with the edges K and its neighbors. The resulting vector passed through an update function f and then aggregated by the function U. Finally they are passed through a non-linear function to get new updated representation.

The previously described GCN and GAT, following a similar formalism can be described in the following figures.

This article includes an interactive session to play aroung with graphs and the most essential GNN variants.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
image		image
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToxicGeometricDL

Table of Contents

Papers

Tutorials

Articles

Repositories

Videos

Tools

What is toxicology?

What is a graph?

Graph Neural Networks

Graph Convolutions

GCN Variants

Message Passing Neural Nets

About

Releases

Packages

soulios/ToxicGeometricDL

Folders and files

Latest commit

History

Repository files navigation

ToxicGeometricDL

Table of Contents

Papers

Tutorials

Articles

Repositories

Videos

Tools

What is toxicology?

What is a graph?

Graph Neural Networks

Graph Convolutions

GCN Variants

Message Passing Neural Nets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages