Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OMIM gene references in Mondo #8108

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Update OMIM gene references in Mondo #8108

wants to merge 7 commits into from

Conversation

matentzn
Copy link
Member

@matentzn matentzn commented Aug 24, 2024

I will add this in draft mode, because this needs extremely careful review by at least @twhetzel and @sabrinatoro.

You can rerun the pipeline by checking out the branch and running

sh run.sh make update-omim-genes -B

I dont know how you want to review this, but I will remind you:

  1. The new OMIM references are the 1:1 references we obtain directly from OMIM. We assume those are all "germline mutation in X" relations to the disease.
  2. The pipeline now:
    1. Deletes all direct OMIM relations (excluding logical definitions, this is a whole nother beast)
    2. Adds all the new OMIM relations back
    3. Update equivalence class definitions with sparql to use the gene from the updated OMIM relations.

I am sure there is much that needs to be fixed, but I wanted to get the ball rolling at least.

I cant work more on this, but I think its worth reviewing it and identifying issues.

@twhetzel
Copy link
Collaborator

Aim to have this in the October release if possible.

@sabrinatoro
Copy link
Collaborator

I have reviewed this PR carefully and have found the following issues. All of these are pretty major, and she be resolved before we can go ahead with this PR.

  1. (already reported here in Nico's comment) The gene annotations with sources from clinicalgenome, PMID, and orcid should be kept.

  2. (already reported here in Nico's comment) Gene identifiers should be ‘http://identifiers.org/hgnc/XXX’ and not ‘https://identifiers.org/hgnc/XXX’

  3. Genes should not be added if the OMIM record is associated with multiple genes.
    Examples in which one gene was incorrectly added to a Mondo record (note that only one of the gene was added, unclear which gene was and not the other)

  1. Some gene annotations were removed but not added back even though there is 1 (and only 1) gene associated with the OMIM record:
    Examples:
  • MONDO:0007037: OMIM:100800, annotation removed and now missing: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/3690 {source="MONDO:mim2gene_medgen"} ! FGFR3
  • MONDO:0007039 - OMIM:101000 , annotation removed and now missing: relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/7773 {source="MONDO:mim2gene_medgen"} ! NF2
  • MONDO:0007041 - OMIM:101200, annotation removed and now missing:relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/3689 {source="MONDO:mim2gene_medgen"} ! FGFR2
  1. The following example is very problematic in many ways. First, the omim record is associated with multiple genes, so this update should not have been made.
    MONDO:0007103 ; OMIM:105400 - change:
  1. (also shown above): Since we do not add a source for equivalent definition, we do not know where they come from. Many of them are created by a curator/Clingen based on the definition of a term which might not have an omim correspondent (e.g. gene-related neuropathy). Maybe we should add sources to the equivalent definition and take this into account when updating the gene annotation.

  2. Since the gene annotation with pmid/clingen/orcid source should be maintained, I suggest that we add a QC check when there is more than one affected gene. In very few cases (e.g. digenic diseases) having more than one gene is ok, but a curator should give the ok.

@twhetzel
Copy link
Collaborator

I am working through these issues.

@twhetzel
Copy link
Collaborator

Related to point 3. "Genes should not be added if the OMIM record is associated with multiple genes", in tracking back the OMIM processing steps I do not see all the genes in one of the initial files, ie omim.ttl. Content from this file is further transformed and eventually used in the omim pipeline. I think this is a bug in that processing and submitted an issue in our OMIM repo about this. I am mentioning this here, in case others not subscribed for updates in that repo.

$(MAKE) $(TMPDIR)/external/processed-mondo-omim-genes.robot.owl -B
# We need to be less aggressive here, as some gene relations were not originally sourced
# from OMIM, and were added, for example, for ClinGen.
grep -vE '^(relationship: has_material_basis_in_germline_mutation_in .*source="OMIM:)' $(SRC) > $(TMPDIR)/mondo-edit.tmp || true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only OMIM is used as the source for has_material_basis_in_germline_mutation_in. However, this update should only remove the axioms where OMIM is the source in case there are different sources in the future.
cc: @matentzn

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just note that at least once we should clear out all these axioms, to not ignore some that right now dont have omim as the source?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn I do not understand your comment. From what I saw all subClassOf relationships that use has_material_basis_in_germline_mutation_in have OMIM as the source unless I missed something in looking at that data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my comment makes zero sense given:

only OMIM is used as the source for has_material_basis_in_germline_mutation_in

it would make sense if:

  1. there are has_material_basis_in_germline_mutation_in axioms which have no evidence
  2. or evidence other than omim

as I think your statement would then not delete them; and it should!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants