Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification clarification #9

Open
douweschulte opened this issue Apr 5, 2024 · 3 comments
Open

Specification clarification #9

douweschulte opened this issue Apr 5, 2024 · 3 comments

Comments

@douweschulte
Copy link

douweschulte commented Apr 5, 2024

Since #6 needs an update to the specification from my side I would like some things to be specified a bit clearer in text in the specification. For these there does not have to be any changes in the format itself, just in the text.

  1. The order of the different kinds of pre sequence modifications. This is specified in one comment in an open issue already as the following: <GLOBAL_MOD>[UNKNOWN_POS]?{LABILE_MOD}[N_TERM]-PEPTIDE-[C_TERM]
  2. There are two different peptide sequence dividers, // for crosslinked peptides (4.2.3.2), and \\ for branched peptides (4.2.4). But this is only clear from its use in the example in the branched section. As a minimum I think this needs explicit mentioning in the branched section. And as a side note maybe the reasoning as I would be interested to hear why two different notations are needed and they cannot be used interchangeably. The last one can be important because of human error, it is easy to misremember and use the 'wrong' one.
  3. The chimeric spectra are quite underspecified in regards to its tie in with the rest of the specification.
    • It is unspecified if any global and/or ambiguous modifications on one also is of influence to any of the other peptidoforms. I assumed any of these is only valid on that peptidoform, which is logical in the MS context where generally the precursor mass is defined. So this example would be invalid: [oxidation#g1]?A[#g1]+B[#g1]
    • It is unspecified how cross linked peptides work in chimeric spectra, I assume no one will actually have any problem with this, but potentially if DIA continues to be used by more and more MS subfields this might be happening at some point. My assumption is that + has the lowest precedence. Meaning that A[#XL1]//B[#XL1]+C[#XL1]//D[#XL1] is a valid expression in the current specification and this means a chimeric spectra containing the peptidoform A linked to B and the peptidoform C linked to D. If there is consensus on this point it might be nice to specify this in the specification.
  4. In section 4.1 page 7 on the amino acids it links to section 7.5.3 for the definition of the ambiguous amino acids, this should be section 7.4.3.
  5. In section 4.2 page 8 the link to XLMOD links to its old location in mzIdentML.
  6. In section 4.6.2 fixed protein modifications it is not defined if the amino acids are allowed to be lowercase (might also be of interest for the discussion in Explicit support for global terminal modifications #6). (Answered in section 4: the whole specification is capitalisation insensitive)
@mobiusklein
Copy link
Collaborator

RE 1, I don't think the specification itself doesn't order [UNKNOWN_POS]? and {LABILE_MOD}, only that they appear after <GLOBAL_MOD> and before [N_TERM]. I may have missed this though because there is no formal grammar.

@douweschulte
Copy link
Author

RERE 1, that might very well be. I did write down my own grammar based on what I read in the spec and I remember having trouble figuring out what the correct order is. Is there a need to fully specify the order? For this we would need to decide first what the order requirement actually is. Or additionally is there interest in me working out my grammar a bit more?

@bittremieux
Copy link

In my grammar I have also used that explicit ordering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants