Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove code for ignoring ambiguous spectra from MSFReader.cpp #3111

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

nickshulman
Copy link
Contributor

A user had a problem that spectra which matched more than one peptide in a .pdresult file were not being put into the .blib even if "Keep ambiguous spectra" was checked.
https://skyline.ms/announcements/home/support/thread.view?rowId=64796

MSFReader has its own code for preventing ambiguous spectra from being included in the .blib which does not pay attention to the "-K" command-line parameter.
As far as I can tell, there is no reason for that code to exist, so I figured that we should probably just remove it.

@nickshulman nickshulman requested a review from chambm August 8, 2024 22:02
@chambm
Copy link
Member

chambm commented Aug 20, 2024

Could this be a performance issue with large pdResult files? Perhaps this internal way of eliminating ambiguous spectra is faster than relying on BiblioSpec's generic method?

@nickshulman
Copy link
Contributor Author

Could this be a performance issue with large pdResult files? Perhaps this internal way of eliminating ambiguous spectra is faster than relying on BiblioSpec's generic method?

I don't think it was done for performance reasons.
This code was written before the "-K" command line parameter existed, so, back then, there was no way to tell BiblioSpec to keep any of the ambiguous spectra. This code would seemed to have been written to keep only the spectrum which scored the highest.

There are a couple of problems with this code as it is:

  1. This code does not know anything about the "Confidence Level" (1, 2, 3) type of scores that most pdresult files use these days and so this code thinks usually thinks all of the ambiguous spectra have equal scores and so they all get removed.
  2. Because this code does not know about the "-K" parameter, there is no way to tell BiblioSpec to keep all of the spectra.

I am, however, thinking that the correct fix for this behavior is not to actually remove this code like in this pull request.
Instead, maybe the code should keep all PSMs that score as well as the best PSM and discard all PSMs that score worse. In this way, the scores that are tied for best place will be kept if "-K" was specified and discarded if "-K" was not specified.

I think the way I have things in this pull request might make the behavior worse in the cases where there are some PSMs that clearly scored worse compared to the best PSMs. I will investigate that.

@nickshulman
Copy link
Contributor Author

I am, however, thinking that the correct fix for this behavior is not to actually remove this code like in this pull request. Instead, maybe the code should keep all PSMs that score as well as the best PSM and discard all PSMs that score worse. In this way, the scores that are tied for best place will be kept if "-K" was specified and discarded if "-K" was not specified.

I think the way I have things in this pull request might make the behavior worse in the cases where there are some PSMs that clearly scored worse compared to the best PSMs. I will investigate that.

Actually, I think that the code changes that I have in this pull request are what we should go with.
This other idea that I was proposing here where we only keep the best scoring PSM among multiple PSMs that scored better than the cutoff would be very different behavior from what we do for other types of search results.

@brendanx67
Copy link
Contributor

brendanx67 commented Sep 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants