Skip to content

Visualizing 3D fragments of proteins in a scatterplot using Foldseek's 3Di and Transformers.

Notifications You must be signed in to change notification settings

xnought/protein-scatter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

protein-scatter

A map of proteins for exploration and discovery. Specifically to explore many local parts of proteins all at once. Accompanied by the paper version explaining everything in depth.

Screen.Recording.2024-03-11.at.3.33.39.PM.mov

This code uses Foldseek's 3Di representation instead of amino acids to train a sequence model. The embeddings from the sequence model are then fed into UMAP for a global visualization.

What makes this system different? Here I explicitly model each protein as the interactions of it's internal 3D structure. I then compare across many different proteins for a global visualization.

Models and Datasets

If you want to reproduce these results check the training code in the training/ directory.

Note that UMAP transformation was does in python notebooks not in the python code.

The weights are saved in checkpoint-large-3.pt in this Google Drive as well as additional training data.

Code References

See the paper protein-scatter.pdf for more references that aren't just code references.