Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not suitable test data #67

Open
LucasSchmidt97 opened this issue Nov 20, 2017 · 5 comments
Open

Not suitable test data #67

LucasSchmidt97 opened this issue Nov 20, 2017 · 5 comments

Comments

@LucasSchmidt97
Copy link
Collaborator

LucasSchmidt97 commented Nov 20, 2017

I firstly assumed that this case was a bug in the code, but as seen in the issue #66, it wasn't.

Problem:
The test data, you can see below, from the deisotoper of the protViz package is not suitable for our deisotoper.

Example 1:

mZ = { 726.068, 726.337, 726.589, 726.842, 727.343, 727.846, 728.346, 728.846, 729.348, 730.248, 730.336, 730.581, 730.836 }
Intensity = { 6.77850e+03, 2.81688e+04, 6.66884e+04, 1.22032e+07, 9.90405e+06, 4.61409e+06, 1.50973e+06, 3.33996e+05, 5.09421e+04, 1.15869e+03, 2.14788e+03, 5.37853e+03, 5.79094e+02 }

Example 2:

mZ = { 642.572, 643.054, 643.569, 644.062, 644.557 }
Intensity = { 17000, 25000, 12000, 9000,4000 }

Explanation:
The test data contains only a part of a mass spectrum. Out scoring algorithm uses the whole peaks, that means in a full spectra all, for example 100 up to 400 peaks, not only a tiny part from it. Therefore the score is calculated by using only the given peaks and this results in, that the scores are all 0 (because the score is calculated by comparing the diffs and sums with the amino acids and other atomic masses). Therefore there can't be a specific best path and because we only have one isotopic set in this cases there is only one best path which is chosen randomly, because all paths have the same score. This results in, that you always have different outputs.

Discussion:
We should discuss about better test data or placing the test data into a specific spectrum to have peaks on which the algorithm can rely on.

@LucasSchmidt97
Copy link
Collaborator Author

LucasSchmidt97 commented Nov 20, 2017

When example 2 is embedded into the first spectrum of TP_HeLa_200ng_filtered_pd21 this is the annotated spectrum:

 IsotopicCluster Peak Charge       mZ  Intensity
              0   57      2  642.5720  17000.00
              0   58      2  643.0540  25000.00
              1   57      2  642.5720  17000.00
              1   58      2  643.0540  25000.00
              1   59      2  643.5690  12000.00
              2   57      1  642.5720  17000.00
              2   59      1  643.5690  12000.00
              3   57      1  642.5720  17000.00
              3   59      1  643.5690  12000.00
              3   61      1  644.5570   4000.00
              4   58      2  643.0540  25000.00
              4   59      2  643.5690  12000.00
              5   58      2  643.0540  25000.00
              5   59      2  643.5690  12000.00
              5   60      2  644.0620   9000.00
              6   58      1  643.0540  25000.00
              6   60      1  644.0620   9000.00
              7   59      2  643.5690  12000.00
              7   60      2  644.0620   9000.00
              8   59      2  643.5690  12000.00
              8   60      2  644.0620   9000.00
              8   61      2  644.5570   4000.00
              9   59      1  643.5690  12000.00
              9   61      1  644.5570   4000.00
             10   60      2  644.0620   9000.00
             10   61      2  644.5570   4000.00

@LucasSchmidt97
Copy link
Collaborator Author

LucasSchmidt97 commented Nov 20, 2017

And the DOT graph looks like this:
Example 2 DOT graph

@LucasSchmidt97
Copy link
Collaborator Author

LucasSchmidt97 commented Nov 20, 2017

Input part:

mZ = { 642.572, 643.054, 643.569, 644.062, 644.557 }
Intensity = { 17000, 25000, 12000, 9000, 4000 }

Output part:

mZ = { 642.572, 643.054,  643.569, 644.062 }
Intensity = { 54000, 46000, 25000, 13000 }

Therefore isotopic clusters 1, 5, 7, 10 were aggregated.
The best path goes like this in the DOT graph:
rplot01bp

@LucasSchmidt97
Copy link
Collaborator Author

LucasSchmidt97 commented Nov 20, 2017

x is the input part and xd is the output part (deisotoped).

rplot02

The configuration for the deisotoping progress was the standart configuration with a modified DELTA (0.03 instead of 0.003) and DECHARGE off.

@LucasSchmidt97
Copy link
Collaborator Author

LucasSchmidt97 commented Nov 20, 2017

When example 1 is embedded into the first spectrum of TP_HeLa_200ng_filtered_pd21 this is the annotated spectrum:

  IsotopicCluster Peak Charge       mZ  Intensity
               0   68      2  726.8420 12203200.00
               0   69      2  727.3430  9904050.00
               1   68      2  726.8420 12203200.00
               1   69      2  727.3430  9904050.00
               1   70      2  727.8460  4614090.00
               2   68      1  726.8420 12203200.00
               2   70      1  727.8460  4614090.00
               3   68      1  726.8420 12203200.00
               3   70      1  727.8460  4614090.00
               3   72      1  728.8460   333996.00
               4   69      2  727.3430  9904050.00
               4   70      2  727.8460  4614090.00
               5   69      2  727.3430  9904050.00
               5   70      2  727.8460  4614090.00
               5   71      2  728.3460  1509730.00
               6   69      1  727.3430  9904050.00
               6   71      1  728.3460  1509730.00
               7   69      1  727.3430  9904050.00
               7   71      1  728.3460  1509730.00
               7   73      1  729.3480    50942.10
               8   70      2  727.8460  4614090.00
               8   71      2  728.3460  1509730.00
               9   70      2  727.8460  4614090.00
               9   71      2  728.3460  1509730.00
               9   72      2  728.8460   333996.00
              10   70      1  727.8460  4614090.00
              10   72      1  728.8460   333996.00
              11   71      2  728.3460  1509730.00
              11   72      2  728.8460   333996.00
              12   71      2  728.3460  1509730.00
              12   72      2  728.8460   333996.00
              12   73      2  729.3480    50942.10
              13   71      1  728.3460  1509730.00
              13   73      1  729.3480    50942.10
              14   72      2  728.8460   333996.00
              14   73      2  729.3480    50942.10

This is the plot of the part:
x is the input part and xd is the output part (deisotoped).

rplot

Input part:

mZ = { 726.068, 726.337, 726.589, 726.842, 727.343, 727.846, 728.346, 728.846, 729.348, 730.248, 730.336, 730.581, 730.836 }
Intensity = { 6.77850e+03, 2.81688e+04, 6.66884e+04, 1.22032e+07, 9.90405e+06, 4.61409e+06, 1.50973e+06, 3.33996e+05, 5.09421e+04, 1.15869e+03, 2.14788e+03, 5.37853e+03, 5.79094e+02 }

Output part:

mZ = { 726.337, 726.589, 726.842, 727.343, 727.846, 728.346, 728.846, 730.248, 730.336, 730.581, 730.836 }
Intensity = { 28168.8, 66688.4, 26721340, 16027870, 6457816, 1894668, 384938.1, 1158.69, 2147.88, 5378.53, 579.094 }

This is the DOT graph:
example1plot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants