Explicit support for global terminal modifications #6

RalfG · 2024-03-18T16:44:39Z

Fixed modifications, such as carbamidomethylation of C can be written as a global modification (section 4.6.2). For instance:

<[Carbamidomethyl]@C>ATPEILTCNSIGCLK

However, it is not explicitly stated whether global terminal modifications are supported, and if so, which "target tags" should be used. I would use this in the case of isobaric labeling modifications. For instance:

<[TMT6plex]@K,N-term>ATPEILTCNSIGCLK

Which would be equivalent to:

[TMT6plex]-ATPEILTCNSIGCLK[TMT6plex]

This would require a definition of the tags to be used for terminal modifications, for example N-term and C-term.

The text was updated successfully, but these errors were encountered:

For now using a workaround while waiting for official support and an implementation in Pyteomics. See HUPO-PSI/ProForma#6

, compomics/psm_utils#71, and HUPO-PSI/ProForma#6

edeutsch · 2024-04-05T15:46:18Z

This is currently legal:
[TMT6plex]-ATPEILTCNSIGCLK[TMT6plex]
<[TMT6plex]@k>[TMT6plex]-ATPEILTCNSIGCLK

Options to extend:
<[TMT6plex]@k,N-term>ATPEILTCNSIGCLK (use 'N-term' and 'C-term')
<[TMT6plex]@k,n>ATPEILTCNSIGCLK (use lower case n and c)
<[TMT6plex]@k,^>ATPEILTCNSIGCLK (use ^ for N-term and $ for C-term)

ProForma current allows amino acids to be lower case, so the second is not a good idea
Seems like the preferred format would be:
<[TMT6plex]@k,N-term>ATPEILTCNSIGCLK

If we wanted to support N-term amino acids:
<[TMT6plex]@k,N-term,N-term A>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term,N-term(A)>ATPEILTCASIGCLK

<[TMT6plex]@k,N-term,N-term:AS>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term,N-term(AS)>ATPEILTCASIGCLK

After discussion, end up
<[TMT6plex]@k,N-term>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term:A,N-term:S>ATPEILD[U:Cation:Fe[III]]CASIGCLK

Discuss again with other ProForma 2.0 stakeholders

Other potential things to change:

Clearly specify the order of <>{}[] at the front
This is not clearly defined in the text of the spec. Update to specify more clearly.

douweschulte · 2024-04-05T17:21:53Z

As a bit of a follow up thought after the meeting. I would argue for @N-term:ABC as a valid representation of the concept of a modification on the N terminus of Alanine, Ambiguous glutamine, or Cysteine. The idea in the meeting itself was to not allow this form and instead use @N-term:A,N-term:B,N-term:C which is a slightly easier grammar.

My argument for allowing the first form is that this is easier to type out. This makes the grammar slightly more complex but there is no dividing character used so the rule is that anything (alphabetic characters only) following the colon is a location where this modification can be placed. In terms of logic for the parser this is not much more complex because it had to check if the character following the initial amino acid is a comma anyways and with this addition it just has to keep taking input until the next comma.

On the level of complexity for the intermediate representation used any program using pro forma notation I would argue there is no difference in either syntax. So that means that any program able to handle the N-term:A,N-term:B notation can without any changes to the code (except for the parser of course) handle the N-term:AB notation.

But I am quite interested to hear about the feasibility from the other people writing ProForma parsers. This mostly reflects how my parser is written and it might be harder if you are using other libraries or parser generators.

mobiusklein · 2024-04-05T22:40:30Z

My argument against the N-term:ABC notation, or packing for ease of reference, is that it introduces an extra layer of complexity and it introduces a second way of specifying a list of amino acid targets. The first is colored by my own implementation choices, but suppose we have the following abstract types:

class ModificationRule {
  modification: Modification
  targets: List<ModificationTarget>
}

class ModificationTarget {
  amino_acid: String | null
  terminal: String | null
}

This fully covers the first existing usage, where each amino acid is a separate ModificationTarget. If we allow packing we now need to allow a ModificationTarget to cover multiple amino acids, or we need to add an extra step after parsing where we split those overloaded targets into separate entries. If we allow variadic ModificationTargets, then we break an implicit contract that a target is about a single amino acid. If we do introduce an intermediate splitting step, we break the 1:1 assumption between syntax and representation, and unless you implement rule merging, N-term:ABC may then be rendered N-term:A,N-term:B,N-term:C. ProForma explicitly doesn't advocate standard canonicalization rules, but round-tripping is nice to have.

The second concern is a syntax to semantics concern. Suppose I write N-term:ABC, and then say "Ah but I also need this rule to target Z, X and Q not on the N-terminal". The spec says I should then write Z,X,Q,N-term:ABC, but I just packed ABC together, so why can't I write ZXQ,N-term:ABC, or I may write Z,X,Q,N-term:A,B,C because I think I have a list of targets.

Neither is intractable to break, and others may implement things in such a way that this is not an issue.

douweschulte · 2024-04-08T07:03:51Z

I do the grouping internally already, so for me on the parser side there is no problem. But your second argument on semantics I fully agree with. So that leaves me in favour of the unpacked syntax.

Support for diagnostic ions from labile modifications Bit of refactoring for proforma parse code

edeutsch · 2024-04-12T16:03:20Z

Original intent:
AC[Carbamidomethyl]AHC[Carbamidomethyl]HAC[Carbamidomethyl]FC[Carbamidomethyl]AC[Carbamidomethyl]
<[Carbamidomethyl]@C>ACAHCHACFCAC

<[Carbamidomethyl]@C>AAHHAFA
(should this be legal? It is according to the current spec, but does it violate the spirit of what ProForma was trying to do?)

Do we want to amend the specification to ProForma 2.1 to clarify these things?
Or should we have an addendum document that clarifies things in ProForma 2.0 that were not clearly specified

Douwe's code has the capability to read a ProForma string that has all the fixed modifications prefixed and normalizes it to what is actually in the peptide.

If we added the N-term support, it would be a breaking change, and would be ProForma 2.1

TODO: Start a Google doc in which we start documenting and resolving these various open issues, including #8 and #9
TODO: Juan will put ProForma 2.0 into an editable Google doc
TODO: Douwe will create a Google doc that is an addendum/clarification of 2.0

bittremieux · 2024-04-14T13:39:35Z

<[Carbamidomethyl]@C>AAHHAFA
(should this be legal? It is according to the current spec, but does it violate the spirit of what ProForma was trying to do?)

This is fine imo.

**Modification caching** All ModificationResolver types now use an in-memory cache for resolved modification definitions, reducing overhead of resolving the same rule over and over again. Sub-classes should move their implementation of `resolve` to the `_resolve_impl` method, otherwise the cache will not be used. To disable the cache for a resolver instance, call `resolver.enable_caching(False)`. **Constant terminal modifications** This implements support for the syntax discussed in HUPO-PSI/ProForma#6 to include constant modification rules that apply to specific sequence terminals with or without specific amino acids.

RalfG added a commit to compomics/psm_utils that referenced this issue Mar 18, 2024

Add support for adding and applying global terminal modifications.

0297b77

For now using a workaround while waiting for official support and an implementation in Pyteomics. See HUPO-PSI/ProForma#6

This was referenced Mar 18, 2024

Add support for globally-defined terminal modifications compomics/psm_utils#71

Merged

Adding peptide n-terminal fixed modification compomics/ms2rescore#125

Closed

RalfG added a commit to compomics/ms2rescore that referenced this issue Mar 18, 2024

docs: Add instructions for adding fixed terminal modifications. See #125

ba33812

, compomics/psm_utils#71, and HUPO-PSI/ProForma#6

douweschulte mentioned this issue Apr 5, 2024

Specification clarification #9

Open

RalfG mentioned this issue Apr 9, 2024

Docs: Add instructions for fixed terminal modifications compomics/ms2rescore#143

Merged

douweschulte added a commit to snijderlab/rustyms that referenced this issue Apr 10, 2024

Support for explicit terminal global mods HUPO-PSI/ProForma#6

a496fab

Support for diagnostic ions from labile modifications Bit of refactoring for proforma parse code

This was referenced Apr 20, 2024

Add modification caching; Constant terminal mods for ProForma levitsky/pyteomics#148

Merged

[proforma] Use cache on modification resolvers levitsky/pyteomics#147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit support for global terminal modifications #6

Explicit support for global terminal modifications #6

RalfG commented Mar 18, 2024

edeutsch commented Apr 5, 2024

douweschulte commented Apr 5, 2024

mobiusklein commented Apr 5, 2024

douweschulte commented Apr 8, 2024

edeutsch commented Apr 12, 2024

bittremieux commented Apr 14, 2024 •

edited

Loading

Explicit support for global terminal modifications #6

Explicit support for global terminal modifications #6

Comments

RalfG commented Mar 18, 2024

edeutsch commented Apr 5, 2024

douweschulte commented Apr 5, 2024

mobiusklein commented Apr 5, 2024

douweschulte commented Apr 8, 2024

edeutsch commented Apr 12, 2024

bittremieux commented Apr 14, 2024 • edited Loading

bittremieux commented Apr 14, 2024 •

edited

Loading