-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix parsing PSMs and complete protein names in XTandem #83
Conversation
I updated the comment for the initial PR, as there were some further additions to it. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #83 +/- ##
==========================================
- Coverage 64.12% 63.97% -0.16%
==========================================
Files 26 26
Lines 2492 2498 +6
==========================================
Hits 1598 1598
- Misses 894 900 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for your contribution. I added a test case and made a slight change.
[edited after adding fix for PSM parsing]
As XTandem's protein names tend to be abbreviated in the protein "label" tag, change the origin to the "note" tag.
While XTandem saves only the highest scoring PSMs per spectrum, these can still be more than one PSM, with different peptidoforms, if the score is exact the same. This is not an extremely rare case, especially with equal peptides (think of a single AA flip in the sequence). This fix parses the identifications with same peptidoforms into one new PSM, with only the relevant proteins assigned to each PSM. Before, there were weird matches of proteins to peptides, which did not occur in the databases used by XTandem.
Also, it seems as the remark that only one protein per peptide/PSM is parsed is thus not true anymore.