forked from luponzo86/rhapsody-website
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfaqs.php
238 lines (219 loc) · 10.1 KB
/
faqs.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
<!DOCTYPE html>
<html lang="en">
<head>
<?php
readfile("./html/header.html");
include 'src/php/utils.php';
?>
</head>
<body>
<?php
$currentPage = 'FAQs';
include './html/navbar.php';
?>
<div class="jumbotron">
<div class="container text-center">
<h2>FAQs</h2>
</div>
</div>
<div class="container">
<div class="form-row">
<div class="col-md"></div>
<div class="col-md-6">
<ol>
<li id="FAQ-general">
<h5>What is Rhapsody?</h5>
<p><small> Rhapsody is a machine learning tool for predicting
the impact of amino acid substitutions in proteins.
It consists of a random forest classifier trained
not only on traditional conservation properties, but also
on structural and <i>dynamical</i> properties
of the mutation site, localized on the protein's PDB structure,
and <i>coevolution</i> properties, extracted from Pfam sequence
alignments.
</small></p>
</li>
<li id="FAQ-SAV">
<h5>What kind of variants can Rhapsody analyze?</h5>
<p><small> Rhapsody can provide predictions for Single Amino acid
Variants (SAVs) in <i>human</i> proteins for which PDB structures
are available.
</small></p>
</li>
<li id="FAQ-whyhuman">
<h5>Why only <i>human</i> SAVs?</h5>
<p><small> Because Rhapsody derives sequence conservation properties
from <a href="http://genetics.bwh.harvard.edu/pph2/" target="_blank">
PolyPhen-2</a>, which is designed to work only for human SAVs.
</small></p>
</li>
<li id="FAQ-formats">
<h5>What are the accepted input formats?</h5>
<p><small> Rhapsody only accepts SAVs in Uniprot coordinates,
with the format:<br>
<code>
<Uniprot ID> <position> <wild-type aa>
<mutated aa></code> .<br>
For instance, mutation Q99R in human protein
<a href="https://www.uniprot.org/uniprot/P01112" target="_blank">
GTPase HRas</a> can be queried by submitting the input string
<code>P01112 99 Q R</code> or
<code>RASH_HUMAN 99 Q R</code> .<br>
We provide a <a href="query_Uniprot.php">Uniprot search tool</a>
to help with the identification of a sequence's unique accession
number. When running an <a href="sat_mutagen.php">
<i>in silico</i> saturation mutagenesis</a> analysis, only the
Uniprot sequence identifier (plus, optionally, a specific position)
should be provided.
</small></p>
</li>
<li id="FAQ-sm">
<h5>What does "<i>in silico</i> saturation mutagenesis" mean?</h5>
<p><small> A complete scanning of all possible 19 amino acid
substitutions at every position in a protein sequence. The
result will be a "saturation mutagenesis table" (see
<a href="./results.php?id=example-sm">example</a>) that not only
contains predictions for individual mutations, but also provides
a general view of the parts in the sequence that are predicted to
be more (or less) sensitive to mutations.
</small></p>
</li>
<li id="FAQ-bq">
<h5>What is a "batch query"?</h5>
<p><small> A batch query allows to submit a list of individual
variants from a single or multiple protein sequences. The list must
contain one variant per line, in
<a href="#FAQ-formats">Uniprot coordinates</a>.
</small></p>
</li>
<li id="FAQ-noPDB">
<h5>What if there is no PDB structure for a given protein?</h5>
<p><small> Normally, when queried with a sequence, Rhapsody searches the
<a href="https://www.rcsb.org/" target="_blank">Protein Data Bank</a>
for the "best" (i.e. the largest) structure available. If a
structure is not found, the user can manually provide a custom
protein structure, by either indicating a PDB code (for instance,
of a homologous protein from another organism) or uploading a file
in PDB format (e.g. downloaded from the
<a href="https://swissmodel.expasy.org/repository" target="_blank">
SWISS-MODEL repository</a> of homology models, see
<a href="https://nbviewer.jupyter.org/github/prody/rhapsody-tutorials/blob/master/tutorials/6-Application_to_ROMK/ROMK_variants_analysis.ipynb">
ROMK tutorial</a> for an example). This option can also be used
to run predictions on a particular protein structure or conformation
(see <a href="https://nbviewer.jupyter.org/github/prody/rhapsody-tutorials/blob/master/tutorials/4-Application_to_HRAS/RAS_variants_analysis.ipynb">
HRAS tutorial</a> for an example).
Please note that Rhapsody will automatically align the Uniprot sequence to
the PDB sequence and compute predictions only for matching amino acids: if
the two sequences are too dissimilar, the resulting predictions might be too
sparse.
</small></p>
</li>
<li id="FAQ-environment">
<h5>What does it mean to include "environmental effects"?</h5>
<p><small>When computing structural and dynamical features from a PDB
structure, by default Rhapsody will only consider a single chain (the one with
higher sequence similarity with the given Uniprot sequence) and ignore
other chains that might be present in the PDB file. Sometimes, for instance
in the case of multimers or other complexes, the presence of other chains
should not be ignored and those properties should be computed for the entire
complex. This is done by using a variant of Elastic Network Model called
"environmental ANM" (more precisely, a "sliced" model, see <a href="about.php">
main publication</a> and <a href="https://nbviewer.jupyter.org/github/prody/rhapsody-tutorials/blob/master/tutorials/6-Application_to_ROMK/ROMK_variants_analysis.ipynb">
ROMK tutorial</a>). In conclusion, environmental effects should be
included if the chain of interest is part of a "stable" complex (e.g. a
multimer) and as such its dynamical properties are influenced and determined
by the presence of other chains. On the other hand, please be aware that
computing predictions on large complexes will take a significantly longer time.
</small></p>
</li>
<li id="FAQ-legend">
<h5>What is the difference between "full" and "reduced" classifiers?</h5>
<p><small>Both "full" and "reduced" classifiers are trained on
sequence-, structure- and dynamics-based features. The main
difference is that the "full" classifier also includes
<i>coevolutionary</i> properties computed on Pfam multiple
sequence alignments. If part of a sequence is not covered by a
Pfam domain, predictions from the "reduced" classifier are returned
instead.
</small></p>
</li>
<li id="FAQ-EVmutation">
<h5>What is the "full+EVmutation" classifier?</h5>
<p><small>The "full+EVmutation" classifier includes in its list of features
used for predictions the "epistatic statistical energy difference of
mutant", computed by <a href="https://marks.hms.harvard.edu/evmutation/"
target="_blank">EVmutation</a> and based on coevolution analysis of
multiple sequence alignments. Although it has been shown to slightly improve
the accuracy of predictions (see <a href="about.php">Rhapsody paper</a>),
by default this additional feature is not included in order to provide
predictions that are independent from those computed by EVmutation.
EVmutation predictions alone are always displayed in the final
results along with those from Rhapsody and
<a href="http://genetics.bwh.harvard.edu/pph2/" target="_blank">
PolyPhen-2</a>.
</small></p>
</li>
<li id="FAQ-output">
<h5>What is displayed in the output files?</h5>
<p><small><ol type="a">
<li><b>Rhapsody predictions (simple view):</b> contains
"combined" predictions from "full" and "reduced" Rhapsody
classifiers. The latter returns "backup" predictions whenever
the primary classifier cannot be applied for lack of Pfam
domains, used for computing coevolutionary features.
<ul>
<li>Column <code>training info</code> indicates whether a variant
was never seen by the classifier (<code>new</code>), thus its
prediction can be considered genuine, or was included in the training
dataset (<code>known_del</code> or <code>known_neu</code>),
thus its prediction cannot be considered unbiased.</li>
<li>Column <code>score</code> contains the output from the random
forest classifier, a real number between 0 and 1.</li>
<li>Column <code>prob.</code> contains a "pathogenicity probability"
calculated by applying a non-linear monotonic transformation to
the random forest score that eliminates the effect of an
imbalanced training dataset (where <code>deleterious</code> labels usually
dominate). After this operation, the threshold between <code>neutral</code>
and <code>deleterious</code> predictions can be set at 0.5.</li>
<li>Column <code>class</code> provides a final classification of
variants into <code>neutral</code> and <code>deleterious</code>.</li>
<li>The last columns on the right contain predicted scores and
classes from
<a href="http://genetics.bwh.harvard.edu/pph2/" target="_blank">
PolyPhen-2</a> and
<a href="https://marks.hms.harvard.edu/evmutation/" target="_blank">
EVmutation</a>.</li>
</ul></li>
<li><b>Rhapsody predictions (detailed view):</b> contains predicted
scores, probabilities and classes from both the "full"
(<code>main</code>) or "reduced" (<code>aux.</code>) classifiers,
as explained above. A left arrow between the two sets of
columns indicates that "reduced" predictions replace missing
"full" predictions in the "combined" results mentioned above.
</li>
<li><b>PolyPhen-2 output:</b> contains the output from
<a href="http://genetics.bwh.harvard.edu/pph2/" target="_blank">
PolyPhen-2</a> web tool.
</li>
<li><b>PDB mapping:</b> contains the mapping of variants from
Uniprot coordinates to PDB structures, if possible. The column
on the left contains the input Uniprot coordinates,
while the second one provides the most up-to-date sequence
IDs, as retrieved from Uniprot.
</li>
<li><b>computed features:</b> lists the values of each feature for
all input variants.
</li>
<li><b>log file:</b> reports the detailed log of the submitted job.
</li>
</ol></small></p>
</li>
</ol>
</div>
<div class="col-md"></div>
</div>
</div>
<?php readfile("./html/footer.html"); ?>
<?php readfile("./html/js_src.html"); ?>
</body>
</html>