Feature representations for new Proteins in DiG #184

sai-advaith · 2024-04-23T10:46:23Z

Hi,

This is regarding protein generation in DiG.

I wanted to know how you obtained the features present in the protein pickle files. As per Appendix B.1 of the paper, the single and pair representations are simply outputs of a pre-trained Evoformer model from AlphaFold given the corresponding protein's Fasta sequence and MSAs.

I set up OpenFold on our systems and saved the representations from Evoformer in a pickle file for the corresponding protein. I used the single and pair keys in the output dictionary in this link. Also, to get the MSAs for the fasta sequence I queried the ColabFold server.

Unfortunately, the representations I received from OpenFold's Evoformer and the representations in the dataset's pickle file were quite different.

Can you please let me know the exact method you used to obtain the single and pair representations for the respective protein fasta sequence?

The text was updated successfully, but these errors were encountered:

zhengsx · 2024-05-27T05:13:57Z

Please use AlphaFold's representations.

LifeWorks · 2024-07-30T00:32:20Z

@sai-advaith Hi I assume you downloaded the datasets and checkpoints successfully, the token expired in May because of Microsoft policy. I wonder would you mind share what you have downloaded? Thanks very much!

amelie-iska · 2024-08-02T00:29:50Z

Same!!! @sai-advaith please share!!! Or @LifeWorks do you have it?

sai-advaith · 2024-08-02T08:48:21Z

I wrote a script (based on AlphaFlow) to extract Evoformer representations. This code will help you get the single and pair representations you'll need to run graphormer.

https://github.com/sai-advaith/evoformer_representation

Is this what you wanted @LifeWorks @amelie-iska ? (Feel free to star if it's relevant and let me know if you have any trouble running it)

LifeWorks · 2024-08-02T17:52:49Z

I wrote a script (based on AlphaFlow) to extract Evoformer representations. This code will help you get the single and pair representations you'll need to run graphormer.

https://github.com/sai-advaith/evoformer_representation

Is this what you wanted @LifeWorks @amelie-iska ? (Feel free to star if it's relevant and let me know if you have any trouble running it)

Thanks for the prompt reply.

I wanted to get the checkpoints and dataset used by DiG to predict the distributions: https://github.com/microsoft/Graphormer/blob/main/distributional_graphormer/README.md
in DiG's readme, they give a SAS token to download their DiG's trained model, but the token expired and the author didn't put any new share links yet.

I wonder did you happen to download all these datasets and checkpoints before the token expired? If so, would you mind kindly reshare the dataset and checkpoints through google share or something?

https://github.com/microsoft/Graphormer/tree/main/distributional_graphormer/protein#trained-parameters

Thanks very much!

amelie-iska · 2024-08-02T18:12:46Z

@LifeWorks and @sai-advaith if either of you have the datasets and checkpoints, please let me know. I think @sai-advaith has a very useful repo, but it's unclear to me at the moment if this is enough for running DiG. I think we need the dataset too no? And the checkpoint isn't available now too? 😢 Let me know if either of you have time to discuss how to get DiG running. I had it running a couple of months ago before they took down the datasets and checkpoints.

sai-advaith · 2024-08-02T19:30:25Z

The dataset consisted of protein fasta sequence (which you can get online) and evoformer representation (from the repo I shared).

I will get back to you regarding the model weights.

LifeWorks · 2024-08-02T19:31:03Z

The dataset consisted of protein fasta sequence (which you can get online) and evoformer representation (from the repo I shared).

I will get back to you regarding the model weights.

I see. Thanks very much! I'm looking forward to the model weights!

amelie-iska · 2024-08-02T19:48:00Z

Thanks so much @sai-advaith and @LifeWorks! I really appreciate the help getting the weights (and the excellent repo for getting the single and pair representations from EvoFormer)! I'd like the protein only weights, but also the protein-ligand weights if you have them or if either of you are able to get them. Please let me know how you would like to share the weights too.

pujaltes · 2024-08-09T18:36:18Z

The model weights and data are still private, would anyone (@sai-advaith, @LifeWorks, @amelie-iska) be able to kindly share them with us?

amelie-iska · 2024-08-15T00:18:30Z

I wish I had them @pujaltes. If you get them, please let me know. I still don't have them.

jeevster · 2024-10-18T20:37:30Z

Hi @sai-advaith, thanks for creating this useful repo! As a sanity check, I tried generating the evoformer representations for one of the proteins (6lu7) for which the representations were already shared by the authors in this repo. I found that the representations produced by OpenFold are slightly different from those provided (for example for the single representations, the cosine similarity averaged across residues is around 0.995, and the ratio of the norms is on average 1.03). Did you find that these differences were minor enough to still yield good samples for the proteins that you tried on?

sai-advaith mentioned this issue Apr 23, 2024

Lack of diversity in the 1ake example prediction by DiG #183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature representations for new Proteins in DiG #184

Feature representations for new Proteins in DiG #184

sai-advaith commented Apr 23, 2024 •

edited

Loading

zhengsx commented May 27, 2024

LifeWorks commented Jul 30, 2024

amelie-iska commented Aug 2, 2024 •

edited

Loading

sai-advaith commented Aug 2, 2024 •

edited

Loading

LifeWorks commented Aug 2, 2024

amelie-iska commented Aug 2, 2024

sai-advaith commented Aug 2, 2024

LifeWorks commented Aug 2, 2024

amelie-iska commented Aug 2, 2024

pujaltes commented Aug 9, 2024

amelie-iska commented Aug 15, 2024

jeevster commented Oct 18, 2024 •

edited

Loading

Feature representations for new Proteins in DiG #184

Feature representations for new Proteins in DiG #184

Comments

sai-advaith commented Apr 23, 2024 • edited Loading

zhengsx commented May 27, 2024

LifeWorks commented Jul 30, 2024

amelie-iska commented Aug 2, 2024 • edited Loading

sai-advaith commented Aug 2, 2024 • edited Loading

LifeWorks commented Aug 2, 2024

amelie-iska commented Aug 2, 2024

sai-advaith commented Aug 2, 2024

LifeWorks commented Aug 2, 2024

amelie-iska commented Aug 2, 2024

pujaltes commented Aug 9, 2024

amelie-iska commented Aug 15, 2024

jeevster commented Oct 18, 2024 • edited Loading

sai-advaith commented Apr 23, 2024 •

edited

Loading

amelie-iska commented Aug 2, 2024 •

edited

Loading

sai-advaith commented Aug 2, 2024 •

edited

Loading

jeevster commented Oct 18, 2024 •

edited

Loading