[WIP] Restructure and rewrite of structure and function predictions #1003

sacdallago · 2020-03-16T08:54:33Z

Please: consider this a draft to see the direction I'm taking. Help would be very welcome!

While writing, I noticed that the restructuring that was needed was more extensive than I had originally intended & stated in #1000

I'm happy with the general introduction ("Protein structure and function predictions")
I'm happy with "Secondary structure"
"Contact and distance maps" seems also quite alright to me.
"3D structure from sequence alone" is intended as a short section mentioning that most methods use contact/distance maps to fold proteins, but some newer methods (see in manuscript) try to directly go from sequence to structure (in an end-to-end fashion). I'm not too well read in this to have the confidence to write it yet, so I'm asking a colleague who does know
"Quaternary structure and protein-protein interactions" I haven't really touched on yet, but also here: I might as two colleagues to look at this, they work in exactly this
I would like to add another section on function prediction where I'd mention subcellular localization & GO annotation prediction (these could actually be 2 or more sections, but I want to keep it easy for now).

REF #1000

sacdallago · 2020-03-16T08:54:58Z

Ping @j3xugit

AppVeyorBot · 2020-03-16T08:57:46Z

AppVeyor build 1.0.74 for commit c2e0e86 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

cgreene · 2020-03-16T12:15:21Z

Hi @sacdallago : I see that you've eliminated the one-sentence-per-line style. This makes it very hard to comment on individual sentences and to track changes in that way. Can you add the line breaks after each sentence back to your PR?

sacdallago · 2020-03-16T20:20:09Z

Oh, I must have missed that in the instructions! Sorry @cgreene :) Will amend now...

AppVeyorBot · 2020-03-16T20:50:55Z

AppVeyor build 1.0.76 for commit 90f941f by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

AppVeyorBot · 2020-03-19T10:37:45Z

AppVeyor build 1.0.77 for commit a233b25 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

sacdallago · 2020-03-30T07:47:04Z

Just bumping this up again :) ( @j3xugit )

j3xugit · 2020-03-31T04:24:10Z

Will work on it very soon.

…

On Mon, Mar 30, 2020 at 2:47 AM Christian Dallago ***@***.***> wrote: Just bumping this up again :) ( @j3xugit <https://github.com/j3xugit> ) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1003 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACSHVXUNG7FB2VNM5JEDXJ3RKBFANANCNFSM4LL6KWDQ> .

-- Full Professor, Toyota Technological Institute at Chicago URL: https://ttic.uchicago.edu/~jinbo/

j3xugit · 2020-03-31T16:16:54Z

The new version reads well, but I do want to do some minor revisions and possibly add a few more references. Do I have the write permission now? How can I do the revision?

cgreene · 2020-03-31T16:42:08Z

@j3xugit great question! the GitHub suggest interface was designed for exactly this!

Mouse over the line you want to change and click the plus sign.

Click the "suggest" button:

Change the content within the backticks to what you want it to say:

If you want to change one and only one line you can do "single comment". Otherwise, to batch them up make all the suggestions you want to make and then select "review changes" and "comment" and submit that.

Thanks!

content/04.study.md

j3xugit · 2020-04-01T04:00:51Z

content/04.study.md

+
+The improvement obtained by these methods may be mainly due to the ability of convolutional neural fields to capture long-range information.
+Top methods still heavily rely on the creation of profiles from Multiple Sequence Alignments (MSAs).
+Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].


Suggested change

Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].

Models relying on LMs have yet to reach the accuracy of evolutionary based methods even for proteins for which good MSAs cannot be built, but these LM methods in general execute at a fraction of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].

@j3xugit

I think your addition has not yet been experimentally validated (even for proteins for which good MSAs cannot be built), and is, in large part, a repetition of the first part of the sentence (Models relying on LMs have yet to reach the accuracy of evolutionary based methods), which is the best (to my knowledge) one can say about the comparison in accuracy between evo vs. LMs.

My original intent (but are able to deliver results for proteins for which MSAs can't be computed) was meant in the sense that evo models (at least the ones I know) actually fail to produce results if MSAs can't be built.

If we can't agree, I'm happy to leave out this second part of the sentence. I would also rewrite the second part into its own sentence and make it clearer, here my suggestion:

Suggested change

Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].

Models relying on LMs have yet to reach the accuracy of evolutionary based methods.

On the upside, LMs require a fraction of the resources for inference compared to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].

It is incorrect to say that evo models fail to produce results if MSAs can't be built. In fact my deep model (and other similar ones) works well on some proteins for which MSAs cannot be built. Baker group has also shown that the deep model trRosetta developed by his group (which is similar to mine) works well on a good portion of human-designed proteins although MSAs cannot be built for them.

Are there specific references we should have in mind to support the discussion in these sentences? That might help make sure all of us are looking at the same models and results.

Right, I might have been too black and white here :) sorry about that!

My belief is that if a model bases predictions on only PSSMs as inputs (or evo couplings), but for a certain dark protein there simply isn't an MSA to start with, those predictions can't be trusted (I picture this as having an input matrix with all zeroes, just to give an idea). It's more of a "conceptual" point than reality ot things.
Models out there today do all sorts of things, including combining sequence-based features (often learned, e.g. via CNNs) with MSA extracted features, so mine would ba simplification.

I'm happy either way :) The more fundamental point of this sentence, for me, is that we still don't have a clear understanding on how well LMs work on those proteins for which also MSA-based methods perform arguably well, and how much we are buying ourselves coverage by using these models instead (or combinations of the two). But results for that will be out in Dec with CASP, I feel.

@sacdallago can you please edit these sentences to take into account @j3xugit's comment and the trRosetta results for de novo proteins? Or @j3xugit could suggest new text for @sacdallago to review.

https://www.pnas.org/content/117/3/1496 (doi:10.1073/pnas.1914677117) could also be a relevant reference to add.

If it's ok, I'll edit this the week of Aug 31st (I'm currently on holidays and it's hard to get into the writing headspace -- especially without a laptop & context overview)

Sending a brief reminder about these edits @sacdallago. It shouldn't require anything too extensive.

@agitter

Thanks very much for the reminder -- unfortunately I had to attach 1 week leave of absence to my holidays due to some unfortunate family circumstances. AKA: sorry for the delay

I don't quite know how to integrate the TrRosetta paper in exactly this section. TrRosetta doesn't use language models to extract additional features, it's still relies exclusively on MSAs (from my understanding). ~~I did add it to an earlier sentence where it appeared to make more sense.~~ Additionally, this section is about secondary structure, while TrRosetta is more about 3D structure prediction. I'll see where I can put it.

I updated some other sections, which in the meantime have seen some new pre-prints and work.

I'll update the PR in about 10 min with the latest changes

content/04.study.md

sacdallago · 2020-04-01T10:57:50Z

Thanks for the comments @j3xugit and explanation @cgreene ; I'll make time on the weekend to go over the changes and integrate them :)

Co-Authored-By: j3xugit <[email protected]>

AppVeyorBot · 2020-04-14T15:24:15Z

AppVeyor build 1.0.81 for commit 4ff4b51 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

Co-Authored-By: j3xugit <[email protected]>

AppVeyorBot · 2020-04-14T15:28:01Z

AppVeyor build 1.0.82 for commit 204c23a by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

AppVeyorBot · 2020-04-14T15:32:05Z

AppVeyor build 1.0.83 for commit a75f45e by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

AppVeyorBot · 2020-04-14T15:36:08Z

AppVeyor build 1.0.84 for commit 45d6ed9 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

sacdallago

@j3xugit Thanks for your help :) I accepted most suggestions but have one open discussion item :)

sacdallago · 2020-04-14T15:43:53Z

content/04.study.md

+
+The improvement obtained by these methods may be mainly due to the ability of convolutional neural fields to capture long-range information.
+Top methods still heavily rely on the creation of profiles from Multiple Sequence Alignments (MSAs).
+Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].


@j3xugit

I think your addition has not yet been experimentally validated (even for proteins for which good MSAs cannot be built), and is, in large part, a repetition of the first part of the sentence (Models relying on LMs have yet to reach the accuracy of evolutionary based methods), which is the best (to my knowledge) one can say about the comparison in accuracy between evo vs. LMs.

My original intent (but are able to deliver results for proteins for which MSAs can't be computed) was meant in the sense that evo models (at least the ones I know) actually fail to produce results if MSAs can't be built.

If we can't agree, I'm happy to leave out this second part of the sentence. I would also rewrite the second part into its own sentence and make it clearer, here my suggestion:

Suggested change

Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].

Models relying on LMs have yet to reach the accuracy of evolutionary based methods.

On the upside, LMs require a fraction of the resources for inference compared to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].

Co-Authored-By: j3xugit <[email protected]>

sacdallago · 2020-04-14T15:46:27Z

Sorry about the delay & for committing the suggestions individually; I figured out only midway through that there was in fact a way of committing multiple changes via the "files" tab!

Thanks for the suggestions, anyway! :)

AppVeyorBot · 2020-04-14T15:49:47Z

AppVeyor build 1.0.85 for commit c77cc95 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

sacdallago · 2020-05-27T09:52:16Z

A ping @j3xugit

agitter · 2020-08-09T20:21:56Z

@sacdallago it looks like these edits already went through one round of review and almost everything has been addressed. It there only one point of discussion to resolve before this is ready to merge?

I'll do a light review for style and copy editing after the scientific questions are all resolved.

j3xugit · 2020-08-09T20:35:44Z

Is there anything that needs my attention ?

…

On Sun, Aug 9, 2020 at 3:22 PM Anthony Gitter ***@***.***> wrote: @sacdallago <https://github.com/sacdallago> it looks like these edits already went through one round of review and almost everything has been addressed. It there only one point of discussion to resolve before this is ready to merge? I'll do a light review for style and copy editing after the scientific questions are all resolved. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1003 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACSHVXUQGGSZA3NPK6VQCIDR74APBANCNFSM4LL6KWDQ> .

-- Full Professor, Toyota Technological Institute at Chicago URL: https://ttic.uchicago.edu/~jinbo/

agitter · 2020-08-09T20:40:47Z

@j3xugit I believe this comment above (#1003 (comment)) may be waiting for your feedback. @sacdallago proposed a change to line 321 to see if you agree with that rephrasing.

j3xugit · 2020-08-10T14:05:54Z

Please see section " Assessing the Ideality of de Novo Protein Designs " in https://www.pnas.org/content/117/3/1496 .

…

On Mon, Aug 10, 2020 at 8:08 AM Anthony Gitter ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In content/04.study.md <#1003 (comment)> : > In 2014, Zhou and Troyanskaya demonstrated that they could improve Q8 accuracy by using a deep supervised and convolutional generative stochastic network ***@***.***:1403.1347]. In 2016 Wang et al. developed a DeepCNF model that improved Q3 and Q8 accuracy as well as prediction of solvent accessibility and disorder regions ***@***.***:10.1038/srep18962; @doi:10.1007/978-3-319-46227-1_1]. -DeepCNF achieved a higher Q3 accuracy than the standard maintained by PSIPRED for more than 10 years. -This improvement may be mainly due to the ability of convolutional neural fields to capture long-range sequential information, which is important for beta strand prediction. -Nevertheless, the improvements in secondary structure prediction from DeepCNF are unlikely to result in a commensurate improvement in tertiary structure prediction since secondary structure mainly reflects coarse-grained local conformation of a protein structure. +Following these successes, recently proposed methods achieve significantly better results than PSIPRED using Deep Learning architectures, with significantly better DL methods today being NetSurfP-2.0 ***@***.***:10.1002/prot.25674] and Porter 5 ***@***.***:10.1038/s41598-019-48786-x]. + +The improvement obtained by these methods may be mainly due to the ability of convolutional neural fields to capture long-range information. +Top methods still heavily rely on the creation of profiles from Multiple Sequence Alignments (MSAs). +Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches ***@***.***:10.1186/s12859-019-3220-8]. Are there specific references we should have in mind to support the discussion in these sentences? That might help make sure all of us are looking at the same models and results. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1003 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACSHVXRDTNB4MXUOUSTD6PTR77WLJANCNFSM4LL6KWDQ> .

-- Full Professor, Toyota Technological Institute at Chicago URL: https://ttic.uchicago.edu/~jinbo/

sacdallago · 2020-08-11T08:49:12Z

@agitter

I think this pass is fine by me. Thanks for pining and looking into this ;)

…text

sacdallago · 2020-09-09T14:54:47Z

Important realization: the 3D prediction section is still jsut in draft. I have no extensive expertize in this, I'm happy to contribute what I can in the following months.

If it's up to me: I will make this a quite short paragraph expanding on the bullet points which I laid out in this section.

AppVeyorBot · 2020-09-09T15:01:53Z

AppVeyor build 1.0.103 for commit 980f5d1 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at:

Restructure and re-write of structure and function predictions

c2e0e86

One sentence per line + fix AlphaFold citation

90f941f

Update @arXiv reference, missing '@'

a233b25

j3xugit reviewed Apr 1, 2020

View reviewed changes

sacdallago and others added 3 commits April 14, 2020 17:20

Update content/04.study.md

4ff4b51

Co-Authored-By: j3xugit <[email protected]>

update references content/04.study.md

204c23a

Co-Authored-By: j3xugit <[email protected]>

Wording update content/04.study.md

a75f45e

Co-Authored-By: j3xugit <[email protected]>

Apply suggestions from code review

45d6ed9

Co-Authored-By: j3xugit <[email protected]>

sacdallago commented Apr 14, 2020

View reviewed changes

Remove sentence content/04.study.md

c77cc95

Co-Authored-By: j3xugit <[email protected]>

september update: add new refs, correct some language, remove faulty …

980f5d1

…text

agitter mentioned this pull request Dec 9, 2020

Alpha Fold 2 #1025

Open

agitter mentioned this pull request Jan 17, 2021

feature and labels #1026

Closed

	Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].
	Models relying on LMs have yet to reach the accuracy of evolutionary based methods even for proteins for which good MSAs cannot be built, but these LM methods in general execute at a fraction of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].

	Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].
	Models relying on LMs have yet to reach the accuracy of evolutionary based methods.
	On the upside, LMs require a fraction of the resources for inference compared to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8].

[WIP] Restructure and rewrite of structure and function predictions #1003

Are you sure you want to change the base?

[WIP] Restructure and rewrite of structure and function predictions #1003

Conversation

sacdallago commented Mar 16, 2020 • edited Loading

sacdallago commented Mar 16, 2020

AppVeyorBot commented Mar 16, 2020

cgreene commented Mar 16, 2020

sacdallago commented Mar 16, 2020

AppVeyorBot commented Mar 16, 2020

AppVeyorBot commented Mar 19, 2020

sacdallago commented Mar 30, 2020

j3xugit commented Mar 31, 2020 via email

j3xugit commented Mar 31, 2020

cgreene commented Mar 31, 2020

j3xugit Apr 1, 2020 • edited Loading

Choose a reason for hiding this comment

sacdallago Apr 14, 2020

Choose a reason for hiding this comment

j3xugit Aug 9, 2020 • edited Loading

Choose a reason for hiding this comment

agitter Aug 10, 2020

Choose a reason for hiding this comment

sacdallago Aug 11, 2020

Choose a reason for hiding this comment

agitter Aug 15, 2020

Choose a reason for hiding this comment

sacdallago Aug 18, 2020

Choose a reason for hiding this comment

agitter Sep 5, 2020

Choose a reason for hiding this comment

sacdallago Sep 9, 2020 • edited Loading

Choose a reason for hiding this comment

sacdallago commented Apr 1, 2020

AppVeyorBot commented Apr 14, 2020

AppVeyorBot commented Apr 14, 2020

AppVeyorBot commented Apr 14, 2020

AppVeyorBot commented Apr 14, 2020

sacdallago left a comment

Choose a reason for hiding this comment

sacdallago Apr 14, 2020

Choose a reason for hiding this comment

sacdallago commented Apr 14, 2020

AppVeyorBot commented Apr 14, 2020

sacdallago commented May 27, 2020

agitter commented Aug 9, 2020

j3xugit commented Aug 9, 2020 via email

agitter commented Aug 9, 2020

j3xugit commented Aug 10, 2020 via email

sacdallago commented Aug 11, 2020

sacdallago commented Sep 9, 2020

AppVeyorBot commented Sep 9, 2020

sacdallago commented Mar 16, 2020 •

edited

Loading

j3xugit Apr 1, 2020 •

edited

Loading

j3xugit Aug 9, 2020 •

edited

Loading

sacdallago Sep 9, 2020 •

edited

Loading