Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Rhea to populate logical definitions of reactions #14984

Closed
pgaudet opened this issue Jan 30, 2018 · 79 comments
Closed

Using Rhea to populate logical definitions of reactions #14984

pgaudet opened this issue Jan 30, 2018 · 79 comments
Assignees
Labels

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Jan 30, 2018

Hello,

Discussing with @amorgat about how Rhea and GO represent biochemical reactions, here are a few points to consider:

  • in Rhea each reaction is represented by a 'master' (undefined direction), and 3 directionalities: left to right/right to left/bi-directional
  • For GO right now our links are to the master (undefined direction), which is at least not wrong. This is also the reaction that corresponds to the link with EC numbers.
  • If we wanted to add the participants to the reactions, that might work (just using 'has participants'). Would that be informative enough? That information would be available this summer or so.
  • Otherwise adding the directionality would be more work. One option would use the (soon to come) mappings between Rhea (direction-specific) and UniProt entries to pick the 'input' and 'output'. These directionalities will be manually curated.

I am putting this here for discussion - @ukemi I let you decide how soon we need to discuss this.

Thanks, Pascale

@ukemi
Copy link
Contributor

ukemi commented Jan 30, 2018

I looked into this some time ago and creating the logical defs is not as simple as it looks on the surface. However, I do think that we could assert participant relations.

Along with the directional issues, there is also stoichiometry to consider to make necessary and sufficient definitions. We don't capture stoichiometry in GO.

Most MF terms are defined as being bidirectional. I think our best bet would be to get the directionality from GO-CAM models. There is also an issue with how we have defined processes such as catabolism and biosynthesis. Both have inputs and outputs as differentia. This causes problems because the things being catabolized aren't the only inputs to the process and the things being made aren't the only outputs. See #11779

@cmungall
Copy link
Member

Re: stoichiometry. We actually do stoichiometry in that it figures into the (currently textual) definition.

I proposed a way of handling the stoi in the OWL and axiomatizing RHEA at the last barharbord mtg:
https://docs.google.com/presentation/d/1QZ96mL1PRE0cLw0pPT5K-R9wdfd07HM4b2OFpCSSELU/edit#slide=id.p17
https://drive.google.com/drive/u/0/folders/0B8kRPmmvPJU3ZFVCb1RCUVFjYTQ

The assumption then was that we would make the GO classes equivalent to the bidi form, but we can revisit that

@deustp01
Copy link

Dealing with charge states - RHEA uses the ChEBI instance that is predominant at pH 7.2, GO is indifferent - will also require some sort of mapping but this is eessentially the mapping already done to aline ChEBI with GO, not a new one.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 31, 2018

And consistent with MetaCyc.

@amorgat
Copy link

amorgat commented Jan 31, 2018

pH 7.3 ;-)
we call them 'normalized compounds' and can give you the mapping between any chebi and its normalized counterpart

@deustp01
Copy link

@amorgat, could I get a copy of that mapping? It's time to align Reactome better with RHEA and this would let us do the work efficiently and increase the odds that we get the chemicals right.
Thanks.

@amorgat
Copy link

amorgat commented Jan 31, 2018

Just a few precisions:
We have 3 categories of reaction participants in Rhea
small molecules: ChEBI entries
polymers: linked to a ChEBI underlying polymer but with possibly different polymerization index (n+1, n-1, etc)
generics: macromolecules are simplified to the functional groups involved in the reactions. Generics may have one or several residues. These residues are ChEBI entries.
See publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384025

@bmeldal
Copy link

bmeldal commented Jan 31, 2018

@amorgat Talking about mapping to RHEA, we also need to map to it from the Complex Portal, on the to-do list for later this year (?). We have EC numbers where available, and of course UniProt ACs. Would that help with mapping? If you want to discuss off-ticket, email me on bmeldal @ ebi. ac. uk :)
Birgit

@deustp01
Copy link

deustp01 commented Jan 31, 2018 via email

@bmeldal
Copy link

bmeldal commented Jan 31, 2018

@deustp01 We'd have to map to RHEA when a complex is catalyst for a certain reaction. It's less about the small molecules than the proteins. But happy to co-ordinate if the mapping suits both!

@ukemi
Copy link
Contributor

ukemi commented Jan 31, 2018

Should we try to make this a discussion at the GOC meeting in May?

@bmeldal
Copy link

bmeldal commented Jan 31, 2018

Sandra will be there for the CP (as well as UniProt, of course!), I have training responsibilities here that week.

@alanbridge
Copy link

I’ll be at the GO meeting in NYC and would be happy to discuss on behalf of the Rhea team (and UniProt).

(As an aside, we have used directional reactions for the construction and annotation of www.swisslipids.org).

@ukemi ukemi removed the GOC meeting label Jan 31, 2018
@ukemi ukemi added this to the 2018-05 GOC meeting milestone Jan 31, 2018
@ukemi
Copy link
Contributor

ukemi commented Jan 31, 2018

Ive added it to the GOC meeting agenda on the wiki.

@sandraorchard
Copy link

Hi, yes happy to represent the Complex Portal (and also UniProt) in these discussions. IntAct is also looking at directionality and working with SIGNOR (https://signor.uniroma2.it/) on an export for this.

@cmungall
Copy link
Member

cmungall commented Mar 7, 2018

We're going to discuss this on the @geneontology/ontology call on monday

@pgaudet
Copy link
Contributor Author

pgaudet commented Mar 12, 2018

Editor's discussion:

  • Chris: we could have a pipeline where we could more easily add new classes based on Rhea
  • We need to make decisions about directionality (Rhea reactions are reversible)
  • If needed (known) we can pick specific directions for certain reactions that we believe only happen in one reaction (or more frequently)
  • Relation for bi-directional reactions: 'has participant'?
  • @ukemi Is the vision to classify reactions by participants? (@cmungall let's see as we move forward)

PLAN

  • Assign Ben Good to do the initial work to create the 'mappings'
  • Goal: have an initial mapping file that we can present at the GOC meeting in May 2018

@goodb
Copy link
Contributor

goodb commented Jun 29, 2018

If anyone is looking at this, its worth noting that the expanded CHEBI import did not include the General class axioms for the new additional CHEBI terms. Hence, e.g.,
diphosphoric acid (CHEBI_29888) is not equated with diphosphate(3-) (CHEBI_33019) in the merged ontology.

@ukemi
Copy link
Contributor

ukemi commented Jun 29, 2018

Isn't that part of the make_file? We can talk about it on Monday, but it makes sense to me to go ahead and run the ChEBI import with the additional terms we will need for the Rhea defs and if it all looks ok go ahead and merge that into master. It won't hurt anything to have the additional ChEBI classes I don't think. It's one more step we can do to get concrete progress along the way.

@goodb
Copy link
Contributor

goodb commented Jun 29, 2018 via email

@goodb
Copy link
Contributor

goodb commented Jun 29, 2018

Here is one more incarnation to consider for the discussion on Monday. If we walk all the way back to definitions that looks like this (for adenylate cyclase activity):
equivalent to:
'catalytic activity'
and (('has input' some 'ATP(4-)'))
and (('has output' some 'diphosphate(3-)')
and ('has output' some '3',5'-cyclic AMP(1-)'))

Then things start looking interesting.

  1. the current Reactome->go-cam reactions get classified correctly. (At least they do by ELK, Arachne in Protege does not seem to work on this for some reason).
  2. we get 534 new direct subclass relations (an increase from 126 in the previous structure). Many of these are non-obvious (to me) because things start intersecting with existing definitions that make use of has input/output constraints.

This does not take the concept of bidirectionality into account, it is just one of the possible directions, but seems to behave mostly the way we want it to, is much easier to look at it, and fits in better with the rest of the ontology. It may be worth considering taking these directional structures (as they are laid out by default in Rhea) as a starting point (that, though potentially incomplete, is not wrong) and then filling in additional classes for the other directions as they are needed.

GO_Simple.zip

@goodb
Copy link
Contributor

goodb commented Jul 10, 2018

Summarizing status here (and adding some documentation of things happening off github).

We are still faced with the challenge of how best to add axioms to define the classes under Catalytic Activity. The principle challenge remains the bidirectional nature of most chemical reactions. This leads naturally to logical constructs that use an OR statement to join the Left-to-right with the Right-to-left. Unfortunately, though sound, this family of definitions does not work with any reasoner that can classify the whole GO. See thread on the ELK reasoner repo about this.

For the problem of inferring the class hierarchy, we have two representations on the table, both of which generate some subclass inferences people aren't sure about. See the spreadsheet where these are laid out for the two formulations. The two formulations are

  • "Ultra-intersection.." uses a fairly complex structure to work around the union problem while still capturing the bidirectional semantics. See comment above and associated zip file with axiomitized ontology
  • "Simple.." reduces things down to a simple collection of inputs and outputs, but ignores bidirectionality, just selecting one direction per class. See ontology zip in directly preceding comment above.

For the problem of inferring the classifications for instances @cmungall suggested a pattern that works nicely using General Concept Inclusion (GCI) axioms. (This does not however influence the problem of class hierarchy inference.) As an example, consider the class ‘phosphoglycerate mutase activity’, which has the textual definition: “Catalysis of the reaction: 2-phospho-D-glycerate = 3-phospho-D-glycerate”. We add the following GCI axiom (and its reverse direction by switching inputs and outputs):

'catalytic activity'
and (('has input' some '2-phosphonato-D-glycerate(3-)'))
and (('has output' some '3-phosphonato-D-glycerate(3-)'))
=>SubClassOf 'phosphoglycerate mutase activity'

Now, when ingesting, for example, the Gluconeogenesis pathway from Reactome, its component reaction ‘2-Phospho-D-glycerate <=> 3-Phospho-D-glycerate’ is correctly and automatically identified as an instance of the GO class ‘phosphoglycerate mutase activity’ based on its inputs and outputs. This example recapitulates a manually assigned GO term from Reactome.

Testing with all 11542 reactions imported from Reactome into GO-CAMs (May 2018), these axioms allow for the automatic classification of 2339 (20%). This is an increase from 794 when using the previous GOPlus without the new GCI axioms. 287 of the classifications are exact recapitulations of manual annotations, the remainder are potential new annotations that should be verified. Note that they may be sub or superclasses of existing annotations - only exact matches are tested for currently.

Examples of terms used in exact recapitulations:

  • Reaction 2-Phospho-D-glycerate <=> 3-Phospho-D-glycerate
    • Inferred annotation: phosphoglycerate mutase activity
  • Reaction malate + NAD+ <=> oxaloacetate + NADH + H+
    -- Inferred annotation: L-malate dehydrogenase activity
  • Reaction (S)-3-Hydroxyhexadecanoyl-CoA+NAD<=>3-Oxopalmitoyl-CoA+NADH+H
    --Inferred annotation: 3-hydroxyacyl-CoA dehydrogenase activity

Examples of new terms used for potential new annotations:

  • ‘SLC38A5-mediated uptake of glutamine, histidine, asparagine, and serine’, had ‘amino acid transmembrane transporter activity’
    --Infer additional annotation to ‘sodium:proton antiporter activity’
  • ISGylation of IRF3 Had ‘ubiquitin-protein transferase activity’
    -- infer ‘molecular function regulator’

**Note that the GCI definitions require the presence of an assertion to type Catalytic Activity. These are not present in the Reactome data. To produce the above statistics, I used the rule ‘if the reaction has inputs {A,B..} and outputs {C, D..}, and at least one A is a CHEBI term and one C is a CHEBI term and {A,B} is not equal to {C, D} then add Catalytic Activity.

For more information about the impact of the GCI axioms on instance classifications from the Reactome import see:
Catalytic_GCI_reactome_term_counts.txt
ELK_reactome_new_mfdef_types.txt

To see all the GCI axioms brought in for terms xrefed to RHEA, see GO_Just_GCI_test.ttl.zip (Also has new complete version of chebi_import merged)

@goodb
Copy link
Contributor

goodb commented Jul 12, 2018

@ukemi
Just to keep things in one place here
Here are the links to the merged ontologies containing different logical definitions
Just GCIs for instance classification
Simple inputs and outputs (unidirectional)
Sophisticated intersection pattern that respects directionality

and all files needed for review, including the GCI inferences, have been added into worksheets in this google spreadsheet (same referenced above)

@ukemi
Copy link
Contributor

ukemi commented Jul 13, 2018

Thanks @goodb. To recap yesterday's discussion with our plan of action. @deustp01 , @hdrabkin and @ukemi will begin with a sanity check of the spreadsheets as a first-pass of the new inferences. We will indicate on the spreadsheet which inferences are correct and which are questionable. Once we have done a pass through of the spreadsheet, we will look at the reasoning behind the questionable inferences and try to determine what is causing the questionable inferences. By the time you return we should be able to provide you with a report of which methods we think are best and whether or not any tweaking is needed. As we meet, we will add our findings to this issue.

@deustp01
Copy link

Next steps for Rhea - GO- Reactome roundtrip
Each Reactome reaction instance that has a catalyst (or transporter) activity attribute maps to a single GO molecular function term, and from that term to a set of the four Rhea reactions that represent the four possible directions of the molecular transformation enabled by the GO activity. Those mappings can be used to find discrepancies in stoichiometry, participation of water and protons, and ionization states and stereochemistries, between the Rhea and Reactome versions of reactions, in a form that should allow the Reactome reactions to be edited to conform to their Rhea counterparts with minimal manual intervention.

To do this, can we build a table that, for each Reactome reaction (strictly, reactionlike event) that has a catalyst activity attribute, lists its identifier, the GO molecular function term extracted from the catalyst activity attribute, and the Rhea master reaction cross-referenced to that GO molecular function term. Are there legitimate reasons for any Reactome-to-GO or GO-to-Rhea mappings to be other than one to one?

With that table, we will be able to retrieve the lists of molecules associated with the Rhea and Reactome versions of the event and their stoichiometries, align them, and identify discrepancies. Will the tables already constructed by GO (GOCHE?) be useful here for making alignments when Rhea and Reactome disagree on charge state or stereochemistry?

Can the pathway context of each Reactome reaction be used to identify the direction of that reaction when the reaction is part of the corresponding biological process?

@cmungall
Copy link
Member

cmungall commented Apr 24, 2019 via email

@goodb
Copy link
Contributor

goodb commented Oct 8, 2019

Resolution is to use the simpler construct that uses ObjectUnionOf (forward, backward) that inspired this request to the ELK team liveontologies/elk-reasoner#54 . @balhoff has a solution.

@goodb
Copy link
Contributor

goodb commented Oct 9, 2019

Axioms to be added to a distinct file go-mf-defs.owl that will be imported into go-plus.
Will also generate a rhea.owl file which will be used for example to propagate xrefs.

@balhoff
Copy link
Member

balhoff commented May 6, 2020

@cmungall if we go with the union-based approach, is there any reason we need the intermediate "substance sets/bags" for inputs and outputs? Am I forgetting something? I think this would work:

(
(catalytic_activity 
    and (has_input some (CHEBI_1 and has_stoich value “2”)) 
    and (has_input some (CHEBI_2 and has_stoich value “1”)) 
    and (has_output some (CHEBI_3 and has_stoich value “2”)) 
    and (has_output some (CHEBI_4 and has_stoich value “1”)))
or
(catalytic_activity 
    and (has_output some (CHEBI_1 and has_stoich value “2”)) 
    and (has_output some (CHEBI_2 and has_stoich value “1”)) 
    and (has_input some (CHEBI_3 and has_stoich value “2”)) 
    and (has_input some (CHEBI_4 and has_stoich value “1”)))
)

@cmungall
Copy link
Member

cmungall commented May 6, 2020 via email

@amorgat
Copy link

amorgat commented May 7, 2020

If I understand correctly, input/output is equivalent to substrate/product, right? i.e a directed reaction.
As the mapping go2rhea is done on undirected reactions, do you envisage to provide links to Rhea directed reactions too? See examples in #19371

@goodb
Copy link
Contributor

goodb commented May 7, 2020

@amorgat the goal of the definition above is to capture the meaning of the undirected reaction - the union groups both directions into one class. I believe the intent is to limit the mapping to the parent undirected reaction from rhea.

@balhoff balhoff self-assigned this Jul 8, 2020
@cmungall
Copy link
Member

cmungall commented Apr 5, 2023

Current status:

  • use the RHEA mappings to make (non-definitional) participant RO:0000057 relationships to CHEBI
  • use the RHEA mappings to make N&S logical definitions used to auto-classify reactions

I think this is sufficient. Adding logical definitions for grouping reactions outside what can be done in OWL

@pgaudet pgaudet closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2023
@pgaudet pgaudet moved this to Wont fix in GO-Reactome-Rhea alignment Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests