-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accession attribute in DBSequence should be unique? #91
Comments
I would definitely vote to have these accessions unique. Having the same accessions for differing entries is probably an error, and it leads to inconsistencies when mapping the peptides and PSMs to the proteins, in the given example. |
I would vote for them not being unique. First, there is the decoy/target example above. More generally, proteins can have the same accession number and different sequences - this is why we're not all clones, right? If the example above leads to inconsistencies then it is an error in the software reading the file, because the id attributes are different? |
I also think they should be same - it is the decoy counter part for the target - and, unless we have a standard way to denoting them as target decoy pair, I would actually ask for them to have same accession. Being able to match these up is important for FDR-estimations, as only this way you can make a meaningful separate (target decoy based) FDR for self/internal/intra vs between/inter. |
could this issue be resolved/closed? I think there are reasons why they are not required to be unique. |
You can have multiple search databases which could have overlapping entries, like searching all of the reviewed sequences of UniProt and then searching again with all the isoforms and unreviewed sequences enabled. The The "supported" method for including decoy proteins in your search database involves adding some marker to the Would it be better if there were an |
that seems sufficient info to close this
where is that documented? (apologies if it's obvious and I'm just being blind) |
right... its shown in the example in Section 7.5 of 1.2.0 spec (though it isn't discussed in the text). It's because its accession is MS:1001283 (not MS:1001450 as in your message, though the link is correct in your message), that I didn't find it. (I searched for MS:1001450). @lutzfischer - I think we've been unaware of this? |
also, re. MS:1001283 - its incorrectly shown as an example CV param for DatabaseName (6.20, pg. 36)? |
Thanks for catching the accession number error earlier. I was writing in a hurry and must have copied over the wrong accession from OLS. I think you're right about the parameters in As-is, this could only be one of the children given here: https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMS_1001013&lang=en&viewMode=All&siblings=false |
i will make a seperate issue for the incorrect DatabaseName example cvParams.
that sounds sensible to me, but then it is a change to the schema |
currently the only "reliable" way to detect if a protein is a decoy protein is to go via PeptideEvidence. But I guess there are other ways to have decoys besides extra decoy proteins - concatenated proteins come to mind - where only a part of the "protein" is decoy. Not sure what would be the best way to represent that. For the case of distinct decoy proteins, actually the current spec document, at least implicitly, by example, suggests different accessions:
|
When parsing example file https://github.com/HUPO-PSI/mzIdentML/blob/master/examples/1_2examples/crosslinking/xiFDR-CrossLinkExample.mzid, I find these 2 protein entries as DBSequence elements:
Although the protein entries are different (one is the decoy entry of the other), the accession attribute is the same.
My question is: should the accession attribute be unique? In the specification document says this about the accession:
This caused my a problem because I am collecting all proteins in a map in which the key is the accession.
What do you think?
The text was updated successfully, but these errors were encountered: