-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Restructure and rewrite of structure and function predictions #1003
base: master
Are you sure you want to change the base?
Conversation
Ping @j3xugit |
AppVeyor build 1.0.74 for commit c2e0e86 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
Hi @sacdallago : I see that you've eliminated the one-sentence-per-line style. This makes it very hard to comment on individual sentences and to track changes in that way. Can you add the line breaks after each sentence back to your PR? |
Oh, I must have missed that in the instructions! Sorry @cgreene :) Will amend now... |
AppVeyor build 1.0.76 for commit 90f941f by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
AppVeyor build 1.0.77 for commit a233b25 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
Just bumping this up again :) ( @j3xugit ) |
Will work on it very soon.
…On Mon, Mar 30, 2020 at 2:47 AM Christian Dallago ***@***.***> wrote:
Just bumping this up again :) ( @j3xugit <https://github.com/j3xugit> )
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1003 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACSHVXUNG7FB2VNM5JEDXJ3RKBFANANCNFSM4LL6KWDQ>
.
--
Full Professor, Toyota Technological Institute at Chicago
URL: https://ttic.uchicago.edu/~jinbo/
|
The new version reads well, but I do want to do some minor revisions and possibly add a few more references. Do I have the write permission now? How can I do the revision? |
@j3xugit great question! the GitHub suggest interface was designed for exactly this! Mouse over the line you want to change and click the plus sign. Change the content within the backticks to what you want it to say: If you want to change one and only one line you can do "single comment". Otherwise, to batch them up make all the suggestions you want to make and then select "review changes" and "comment" and submit that. Thanks! |
content/04.study.md
Outdated
|
||
The improvement obtained by these methods may be mainly due to the ability of convolutional neural fields to capture long-range information. | ||
Top methods still heavily rely on the creation of profiles from Multiple Sequence Alignments (MSAs). | ||
Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8]. | |
Models relying on LMs have yet to reach the accuracy of evolutionary based methods even for proteins for which good MSAs cannot be built, but these LM methods in general execute at a fraction of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your addition has not yet been experimentally validated (even for proteins for which good MSAs cannot be built
), and is, in large part, a repetition of the first part of the sentence (Models relying on LMs have yet to reach the accuracy of evolutionary based methods
), which is the best (to my knowledge) one can say about the comparison in accuracy between evo vs. LMs.
My original intent (but are able to deliver results for proteins for which MSAs can't be computed
) was meant in the sense that evo models (at least the ones I know) actually fail to produce results if MSAs can't be built.
If we can't agree, I'm happy to leave out this second part of the sentence. I would also rewrite the second part into its own sentence and make it clearer, here my suggestion:
Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8]. | |
Models relying on LMs have yet to reach the accuracy of evolutionary based methods. | |
On the upside, LMs require a fraction of the resources for inference compared to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is incorrect to say that evo models fail to produce results if MSAs can't be built. In fact my deep model (and other similar ones) works well on some proteins for which MSAs cannot be built. Baker group has also shown that the deep model trRosetta developed by his group (which is similar to mine) works well on a good portion of human-designed proteins although MSAs cannot be built for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there specific references we should have in mind to support the discussion in these sentences? That might help make sure all of us are looking at the same models and results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I might have been too black and white here :) sorry about that!
My belief is that if a model bases predictions on only PSSMs as inputs (or evo couplings), but for a certain dark protein there simply isn't an MSA to start with, those predictions can't be trusted (I picture this as having an input matrix with all zeroes, just to give an idea). It's more of a "conceptual" point than reality ot things.
Models out there today do all sorts of things, including combining sequence-based features (often learned, e.g. via CNNs) with MSA extracted features, so mine would ba simplification.
I'm happy either way :) The more fundamental point of this sentence, for me, is that we still don't have a clear understanding on how well LMs work on those proteins for which also MSA-based methods perform arguably well, and how much we are buying ourselves coverage by using these models instead (or combinations of the two). But results for that will be out in Dec with CASP, I feel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sacdallago can you please edit these sentences to take into account @j3xugit's comment and the trRosetta results for de novo proteins? Or @j3xugit could suggest new text for @sacdallago to review.
https://www.pnas.org/content/117/3/1496 (doi:10.1073/pnas.1914677117
) could also be a relevant reference to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's ok, I'll edit this the week of Aug 31st (I'm currently on holidays and it's hard to get into the writing headspace -- especially without a laptop & context overview)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending a brief reminder about these edits @sacdallago. It shouldn't require anything too extensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Thanks very much for the reminder -- unfortunately I had to attach 1 week leave of absence to my holidays due to some unfortunate family circumstances. AKA: sorry for the delay
- I don't quite know how to integrate the TrRosetta paper in exactly this section. TrRosetta doesn't use language models to extract additional features, it's still relies exclusively on MSAs (from my understanding).
I did add it to an earlier sentence where it appeared to make more sense.Additionally, this section is about secondary structure, while TrRosetta is more about 3D structure prediction. I'll see where I can put it. - I updated some other sections, which in the meantime have seen some new pre-prints and work.
I'll update the PR in about 10 min with the latest changes
Co-Authored-By: j3xugit <[email protected]>
Co-Authored-By: j3xugit <[email protected]>
Co-Authored-By: j3xugit <[email protected]>
AppVeyor build 1.0.81 for commit 4ff4b51 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
Co-Authored-By: j3xugit <[email protected]>
AppVeyor build 1.0.82 for commit 204c23a by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
AppVeyor build 1.0.83 for commit a75f45e by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
AppVeyor build 1.0.84 for commit 45d6ed9 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@j3xugit Thanks for your help :) I accepted most suggestions but have one open discussion item :)
content/04.study.md
Outdated
|
||
The improvement obtained by these methods may be mainly due to the ability of convolutional neural fields to capture long-range information. | ||
Top methods still heavily rely on the creation of profiles from Multiple Sequence Alignments (MSAs). | ||
Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your addition has not yet been experimentally validated (even for proteins for which good MSAs cannot be built
), and is, in large part, a repetition of the first part of the sentence (Models relying on LMs have yet to reach the accuracy of evolutionary based methods
), which is the best (to my knowledge) one can say about the comparison in accuracy between evo vs. LMs.
My original intent (but are able to deliver results for proteins for which MSAs can't be computed
) was meant in the sense that evo models (at least the ones I know) actually fail to produce results if MSAs can't be built.
If we can't agree, I'm happy to leave out this second part of the sentence. I would also rewrite the second part into its own sentence and make it clearer, here my suggestion:
Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8]. | |
Models relying on LMs have yet to reach the accuracy of evolutionary based methods. | |
On the upside, LMs require a fraction of the resources for inference compared to evolutionary based approaches [@doi:10.1186/s12859-019-3220-8]. |
Co-Authored-By: j3xugit <[email protected]>
Sorry about the delay & for committing the suggestions individually; I figured out only midway through that there was in fact a way of committing multiple changes via the "files" tab! Thanks for the suggestions, anyway! :) |
AppVeyor build 1.0.85 for commit c77cc95 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
A ping @j3xugit |
@sacdallago it looks like these edits already went through one round of review and almost everything has been addressed. It there only one point of discussion to resolve before this is ready to merge? I'll do a light review for style and copy editing after the scientific questions are all resolved. |
Is there anything that needs my attention ?
…On Sun, Aug 9, 2020 at 3:22 PM Anthony Gitter ***@***.***> wrote:
@sacdallago <https://github.com/sacdallago> it looks like these edits
already went through one round of review and almost everything has been
addressed. It there only one point of discussion to resolve before this is
ready to merge?
I'll do a light review for style and copy editing after the scientific
questions are all resolved.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1003 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACSHVXUQGGSZA3NPK6VQCIDR74APBANCNFSM4LL6KWDQ>
.
--
Full Professor, Toyota Technological Institute at Chicago
URL: https://ttic.uchicago.edu/~jinbo/
|
@j3xugit I believe this comment above (#1003 (comment)) may be waiting for your feedback. @sacdallago proposed a change to line 321 to see if you agree with that rephrasing. |
Please see section " Assessing the Ideality of de Novo Protein Designs " in
https://www.pnas.org/content/117/3/1496 .
…On Mon, Aug 10, 2020 at 8:08 AM Anthony Gitter ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In content/04.study.md
<#1003 (comment)>
:
> In 2014, Zhou and Troyanskaya demonstrated that they could improve Q8 accuracy by using a deep supervised and convolutional generative stochastic network ***@***.***:1403.1347].
In 2016 Wang et al. developed a DeepCNF model that improved Q3 and Q8 accuracy as well as prediction of solvent accessibility and disorder regions ***@***.***:10.1038/srep18962; @doi:10.1007/978-3-319-46227-1_1].
-DeepCNF achieved a higher Q3 accuracy than the standard maintained by PSIPRED for more than 10 years.
-This improvement may be mainly due to the ability of convolutional neural fields to capture long-range sequential information, which is important for beta strand prediction.
-Nevertheless, the improvements in secondary structure prediction from DeepCNF are unlikely to result in a commensurate improvement in tertiary structure prediction since secondary structure mainly reflects coarse-grained local conformation of a protein structure.
+Following these successes, recently proposed methods achieve significantly better results than PSIPRED using Deep Learning architectures, with significantly better DL methods today being NetSurfP-2.0 ***@***.***:10.1002/prot.25674] and Porter 5 ***@***.***:10.1038/s41598-019-48786-x].
+
+The improvement obtained by these methods may be mainly due to the ability of convolutional neural fields to capture long-range information.
+Top methods still heavily rely on the creation of profiles from Multiple Sequence Alignments (MSAs).
+Models relying on LMs have yet to reach the accuracy of evolutionary based methods, but are able to deliver results for proteins for which MSAs can't be computed and in general execute at a fracion of the time with respect to evolutionary based approaches ***@***.***:10.1186/s12859-019-3220-8].
Are there specific references we should have in mind to support the
discussion in these sentences? That might help make sure all of us are
looking at the same models and results.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1003 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACSHVXRDTNB4MXUOUSTD6PTR77WLJANCNFSM4LL6KWDQ>
.
--
Full Professor, Toyota Technological Institute at Chicago
URL: https://ttic.uchicago.edu/~jinbo/
|
I think this pass is fine by me. Thanks for pining and looking into this ;) |
Important realization: the 3D prediction section is still jsut in draft. I have no extensive expertize in this, I'm happy to contribute what I can in the following months. If it's up to me: I will make this a quite short paragraph expanding on the bullet points which I laid out in this section. |
AppVeyor build 1.0.103 for commit 980f5d1 by @sacdallago is now complete. The rendered manuscript from this build is temporarily available for download at: |
Please: consider this a draft to see the direction I'm taking. Help would be very welcome!
While writing, I noticed that the restructuring that was needed was more extensive than I had originally intended & stated in #1000
REF #1000