Skip to content

Commit

Permalink
Merge pull request #332 from monarch-initiative/issue-304-2
Browse files Browse the repository at this point in the history
Refactored code to automatically map new ontologies
  • Loading branch information
hrshdhgd authored Jun 13, 2023
2 parents 29f99e8 + 1cac4ef commit c7d828a
Show file tree
Hide file tree
Showing 24 changed files with 2,794 additions and 309 deletions.
2 changes: 1 addition & 1 deletion src/mappings/mondo-sources-all-lexical-2.sssom.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# skos: http://www.w3.org/2004/02/skos/core#
# sssom: https://w3id.org/sssom/
# license: https://w3id.org/sssom/license/unspecified
# mapping_set_id: https://w3id.org/sssom/mappings/ba72e520-50b6-4543-9809-27227086d6ab
# mapping_set_id: https://w3id.org/sssom/mappings/eeda0b08-cd2a-49e2-b89e-f396a4aec8fb
subject_id subject_label predicate_id object_id object_label mapping_justification mapping_tool confidence subject_match_field object_match_field match_string
MONDO:0000001 disease skos:exactMatch NCIT:C156809 Medical Condition semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label medical condition
MONDO:0000001 disease skos:exactMatch NCIT:C25457 Condition semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label condition
Expand Down
26 changes: 26 additions & 0 deletions src/mappings/rejected-mappings.sssom.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
subject_id subject_label object_id predicate_id object_label mapping_justification mapping_tool confidence subject_match_field object_match_field match_string comment predicate_modifier reviewer_id reason for rejected
MONDO:0005571 polycythemia NCIT:C27794 MONDO:equivalentTo Polycythemia (Excluding Polycythemia Vera) semapv:LexicalMatching oaklib 0.8497788952 rdfs:label rdfs:label polycythemia LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 mapping rule incorrect
MONDO:0016642 meningioma DOID:0080842 MONDO:equivalentTo intracranial meningioma semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label intracranial meningioma LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 synonym type incorrect
MONDO:0016715 ependymoblastoma DOID:0081286 MONDO:equivalentTo embryonal tumor with multilayered rosettes semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label embryonal tumor with multilayered rosettes LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 split/merge difference in Mondo vs source
MONDO:0020300 autosomal dominant nocturnal frontal lobe epilepsy DOID:0081119 MONDO:equivalentTo benign familial infantile seizures 6 semapv:LexicalMatching oaklib 0.8 rdfs:label oio:hasExactSynonym autosomal dominant nocturnal frontal lobe epilepsy LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 split/merge difference in Mondo vs source
MONDO:0017858 acute erythroid leukemia DOID:0080916 MONDO:equivalentTo erythroleukemia semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label erythroleukemia LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 synonym type incorrect
MONDO:0008675 Freeman-Sheldon syndrome DOID:0111605 MONDO:equivalentTo distal arthrogryposis type 2A semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym distal arthrogryposis type 2a LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 split/merge difference in Mondo vs source
MONDO:0007648 hereditary diffuse gastric adenocarcinoma DOID:0080764 MONDO:equivalentTo hereditary diffuse gastric cancer semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label hereditary diffuse gastric cancer LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 split/merge difference in Mondo vs source
MONDO:0007762 hyperlipoproteinemia type V DOID:0111421 MONDO:equivalentTo familial apolipoprotein A5 deficiency semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym familial apoa5 deficiency LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 split/merge difference in Mondo vs source
MONDO:0007919 lymphatic malformation 1 DOID:0070212 MONDO:equivalentTo hereditary lymphedema I semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym congenital primary lymphedema LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 split/merge difference in Mondo vs source
MONDO:0006515 acute pancreatitis DOID:0080998 MONDO:equivalentTo acute necrotizing pancreatitis semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label acute necrotizing pancreatitis LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 synonym type incorrect
MONDO:0007538 amelogenesis imperfecta, type 3A DOID:0111721 MONDO:equivalentTo amelogenesis imperfecta type 3 semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label amelogenesis imperfecta type 3 LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 synonym type incorrect
MONDO:0018922 cold agglutinin disease DOID:0111275 MONDO:equivalentTo speech-language disorder-1 semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym cas LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 incorrect match on abbreviation
MONDO:0013162 autosomal recessive limb-girdle muscular dystrophy type 2N DOID:0112382 MONDO:equivalentTo muscular dystrophy-dystroglycanopathy type C8 semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym mddgc2 LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 incorrect match on abbreviation
MONDO:0003917 heart lymphoma DOID:0070212 MONDO:equivalentTo hereditary lymphedema I semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym pcl LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 incorrect match on abbreviation
MONDO:0018689 plasma cell leukemia DOID:0070212 MONDO:equivalentTo hereditary lymphedema I semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym pcl LEXMATCH NOT https://orcid.org/0000-0002-4142-7153 incorrect match on abbreviation
MONDO:0002514 hepatobiliary neoplasm DOID:3117 MONDO:equivalentTo hepatobiliary benign neoplasm semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym hepatobiliary tumors synonym type incorrect
MONDO:0006033 diffuse intrinsic pontine glioma DOID:0080684 MONDO:equivalentTo diffuse midline glioma, H3 K27M-mutant semapv:LexicalMatching oaklib 0.8 rdfs:label oio:hasExactSynonym diffuse intrinsic pontine glioma synonym type incorrect
MONDO:0009288 glycogen storage disease Ib DOID:0081331 MONDO:equivalentTo glycogen storage disease Ic semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label glycogen storage disorder ic split/merge difference in Mondo vs source
MONDO:0018433 acute myeloid leukemia with t(6;9)(p23;q34) DOID:0081080 MONDO:equivalentTo acute myeloid leukemia with t(6;9) (p23;q34.1) semapv:LexicalMatching oaklib 0.8 rdfs:label oio:hasExactSynonym acute myeloid leukemia with t(6;9)(p23;q34) synonym type incorrect
MONDO:0018821 X-linked female restricted facial dysmorphism-short stature-choanal atresia-intellectual disability DOID:0112025 MONDO:equivalentTo female-restricted syndromic X-linked intellectual disability 99 semapv:LexicalMatching oaklib 0.8 rdfs:label oio:hasExactSynonym x-linked female restricted facial dysmorphism-short stature-choanal atresia-intellectual disability synonym type incorrect
MONDO:0019976 dementia pugilistica DOID:0081291 MONDO:equivalentTo chronic traumatic encephalopathy semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label chronic traumatic encephalopathy synonym type incorrect
MONDO:0044212 chronic idiopathic urticaria DOID:0080749 MONDO:equivalentTo chronic spontaneous urticaria semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym oio:hasExactSynonym chronic idiopathic urticaria synonym type incorrect
MONDO:0044212 chronic idiopathic urticaria DOID:0080749 MONDO:equivalentTo chronic spontaneous urticaria semapv:LexicalMatching oaklib 0.8 rdfs:label oio:hasExactSynonym chronic idiopathic urticaria synonym type incorrect
MONDO:0859359 blood group, er OMIM:620207 MONDO:equivalentTo blood group, er https://orcid.org/0000-0001-5208-3432 not a disease
MONDO:0859195 hypoplastic femurs and pelvis OMIM:619545 MONDO:equivalentTo hypoplastic femurs and pelvis https://orcid.org/0000-0001-5208-3432 phenotype
59 changes: 29 additions & 30 deletions src/ontology/lexmatch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,49 +3,48 @@
* mondo-only: Positive mappings in MONDO not caught by the lexical mapping pipeline
* split-mapping-set: Unmapped mappings broken down by predicate_id
## Summary of mappings:
## unmapped_xxxx_lex & unmapped_xxxx_lex_exact
* Number of mappings in [`unmapped_icd_lex`](unmapped_icd_lex.tsv): 1928
* Number of mappings in [`unmapped_icd_lex_exact`](unmapped_icd_lex.tsv): 1528
* Number of mappings in [`unmapped_omim_lex`](unmapped_omim_lex.tsv): 1
* Number of mappings in [`unmapped_omim_lex_exact`](unmapped_omim_lex.tsv): 1
* Number of mappings in [`unmapped_ordo_lex`](unmapped_ordo_lex.tsv): 1
* Number of mappings in [`unmapped_ordo_lex_exact`](unmapped_ordo_lex.tsv): 1
* Number of mappings in [`unmapped_doid_lex`](unmapped_doid_lex.tsv): 87
* Number of mappings in [`unmapped_doid_lex_exact`](unmapped_doid_lex.tsv): 58
* Number of mappings in [`unmapped_ncit_lex`](unmapped_ncit_lex.tsv): 75
* Number of mappings in [`unmapped_ncit_lex_exact`](unmapped_ncit_lex.tsv): 39
* Number of mappings in [`unmapped_gard_lex`](unmapped_gard_lex.tsv): 11130
* Number of mappings in [`unmapped_gard_lex_exact`](unmapped_gard_lex.tsv): 3032
## unmapped_xxxx_mondo
* Number of mappings in [`unmapped_icd_mondo`](mondo-only/unmapped_icd_mondo.tsv): 39
* Number of mappings in [`unmapped_icd_mondo_exact`](mondo-only/unmapped_icd_mondo.tsv): 39
* Number of mappings in [`unmapped_omim_mondo`](mondo-only/unmapped_omim_mondo.tsv): 1834
* Number of mappings in [`unmapped_omim_mondo_exact`](mondo-only/unmapped_omim_mondo.tsv): 945
* Number of mappings in [`unmapped_ordo_mondo`](mondo-only/unmapped_ordo_mondo.tsv): 45
* Number of mappings in [`unmapped_ordo_mondo_exact`](mondo-only/unmapped_ordo_mondo.tsv): 28
* Number of mappings in [`unmapped_doid_mondo`](mondo-only/unmapped_doid_mondo.tsv): 76
* Number of mappings in [`unmapped_doid_mondo_exact`](mondo-only/unmapped_doid_mondo.tsv): 76
* Number of mappings in [`unmapped_ncit_mondo`](mondo-only/unmapped_ncit_mondo.tsv): 2204
* Number of mappings in [`unmapped_ncit_mondo_exact`](mondo-only/unmapped_ncit_mondo.tsv): 1165
* Number of mappings in [`unmapped_gard_lex`](unmapped_gard_lex.tsv): 11130
* Number of mappings in [`unmapped_gard_lex_exact`](unmapped_gard_lex.tsv): 3032
* Number of mappings in [`unmapped_gard_mondo`](mondo-only/unmapped_gard_mondo.tsv): 1
* Number of mappings in [`unmapped_gard_mondo_exact`](mondo-only/unmapped_gard_mondo.tsv): 1
* Number of mappings in [`unmapped_icd10cm_lex`](unmapped_icd10cm_lex.tsv): 1928
* Number of mappings in [`unmapped_icd10cm_lex_exact`](unmapped_icd10cm_lex.tsv): 1528
* Number of mappings in [`unmapped_icd10cm_mondo`](mondo-only/unmapped_icd10cm_mondo.tsv): 39
* Number of mappings in [`unmapped_icd10cm_mondo_exact`](mondo-only/unmapped_icd10cm_mondo.tsv): 39
* Number of mappings in [`unmapped_icd10who_lex`](unmapped_icd10who_lex.tsv): 1211
* Number of mappings in [`unmapped_icd10who_lex_exact`](unmapped_icd10who_lex.tsv): 468
* Number of mappings in [`unmapped_icd10who_mondo`](mondo-only/unmapped_icd10who_mondo.tsv): 6
* Number of mappings in [`unmapped_icd10who_mondo_exact`](mondo-only/unmapped_icd10who_mondo.tsv): 6
* Number of mappings in [`unmapped_ncit_lex`](unmapped_ncit_lex.tsv): 75
* Number of mappings in [`unmapped_ncit_lex_exact`](unmapped_ncit_lex.tsv): 39
* Number of mappings in [`unmapped_ncit_mondo`](mondo-only/unmapped_ncit_mondo.tsv): 2204
* Number of mappings in [`unmapped_ncit_mondo_exact`](mondo-only/unmapped_ncit_mondo.tsv): 1165
* Number of mappings in [`unmapped_omim_lex`](unmapped_omim_lex.tsv): 1
* Number of mappings in [`unmapped_omim_lex_exact`](unmapped_omim_lex.tsv): 1
* Number of mappings in [`unmapped_omim_mondo`](mondo-only/unmapped_omim_mondo.tsv): 1834
* Number of mappings in [`unmapped_omim_mondo_exact`](mondo-only/unmapped_omim_mondo.tsv): 945
## mondo_XXXXmatch_ontology
* Number of mappings in [`mondo_closematch_gard`](split-mapping-set/mondo_closematch_gard.tsv): 35342
* Number of mappings in [`mondo_broadmatch_gard`](split-mapping-set/mondo_broadmatch_gard.tsv): 244
* Number of mappings in [`mondo_narrowmatch_gard`](split-mapping-set/mondo_narrowmatch_gard.tsv): 116
* Number of mappings in [`mondo_exactmatch_gard`](split-mapping-set/mondo_exactmatch_gard.tsv): 11129
* Number of mappings in [`mondo_closematch_doid`](split-mapping-set/mondo_closematch_doid.tsv): 252
* Number of mappings in [`mondo_closematch_gard`](split-mapping-set/mondo_closematch_gard.tsv): 35342
* Number of mappings in [`mondo_broadmatch_ncit`](split-mapping-set/mondo_broadmatch_ncit.tsv): 10
* Number of mappings in [`mondo_exactmatch_ncit`](split-mapping-set/mondo_exactmatch_ncit.tsv): 2277
* Number of mappings in [`mondo_closematch_ncit`](split-mapping-set/mondo_closematch_ncit.tsv): 72
* Number of mappings in [`mondo_broadmatch_doid`](split-mapping-set/mondo_broadmatch_doid.tsv): 1
* Number of mappings in [`mondo_exactmatch_doid`](split-mapping-set/mondo_exactmatch_doid.tsv): 161
* Number of mappings in [`mondo_closematch_orphanet`](split-mapping-set/mondo_closematch_orphanet.tsv): 3
* Number of mappings in [`mondo_broadmatch_orphanet`](split-mapping-set/mondo_broadmatch_orphanet.tsv): 1
* Number of mappings in [`mondo_exactmatch_orphanet`](split-mapping-set/mondo_exactmatch_orphanet.tsv): 44
* Number of mappings in [`mondo_closematch_icd10cm`](split-mapping-set/mondo_closematch_icd10cm.tsv): 361
* Number of mappings in [`mondo_closematch_doid`](split-mapping-set/mondo_closematch_doid.tsv): 252
* Number of mappings in [`mondo_broadmatch_icd10who`](split-mapping-set/mondo_broadmatch_icd10who.tsv): 30
* Number of mappings in [`mondo_narrowmatch_icd10who`](split-mapping-set/mondo_narrowmatch_icd10who.tsv): 22
* Number of mappings in [`mondo_exactmatch_icd10who`](split-mapping-set/mondo_exactmatch_icd10who.tsv): 1215
* Number of mappings in [`mondo_closematch_icd10who`](split-mapping-set/mondo_closematch_icd10who.tsv): 149
* Number of mappings in [`mondo_broadmatch_icd10cm`](split-mapping-set/mondo_broadmatch_icd10cm.tsv): 82
* Number of mappings in [`mondo_narrowmatch_icd10cm`](split-mapping-set/mondo_narrowmatch_icd10cm.tsv): 58
* Number of mappings in [`mondo_exactmatch_icd10cm`](split-mapping-set/mondo_exactmatch_icd10cm.tsv): 1965
* Number of mappings in [`mondo_closematch_ncit`](split-mapping-set/mondo_closematch_ncit.tsv): 72
* Number of mappings in [`mondo_broadmatch_ncit`](split-mapping-set/mondo_broadmatch_ncit.tsv): 10
* Number of mappings in [`mondo_exactmatch_ncit`](split-mapping-set/mondo_exactmatch_ncit.tsv): 2277
* Number of mappings in [`mondo_closematch_omim`](split-mapping-set/mondo_closematch_omim.tsv): 2
* Number of mappings in [`mondo_closematch_icd10cm`](split-mapping-set/mondo_closematch_icd10cm.tsv): 361
* Number of mappings in [`mondo_exactmatch_omim`](split-mapping-set/mondo_exactmatch_omim.tsv): 1282
* Number of mappings in [`mondo_closematch_omim`](split-mapping-set/mondo_closematch_omim.tsv): 2
7 changes: 7 additions & 0 deletions src/ontology/lexmatch/mondo-only/unmapped_icd10who_mondo.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
subject_id subject_label object_id predicate_id object_label mapping_justification mapping_tool confidence subject_match_field object_match_field match_string
ID A oboInOwl:hasDbXref >A oboInOwl:source
MONDO:0001946 obsolete hyperestrogenism ICD10WHO:E28.0 MONDO:equivalentTo Ovarian dysfunction: Estrogen excess semapv:UnspecifiedMatching
MONDO:0005165 benign neoplasm ICD10WHO:D10-D36 MONDO:equivalentTo Benign neoplasms semapv:UnspecifiedMatching
MONDO:0015079 multiple polyglandular tumor ICD10WHO:D44.8 MONDO:equivalentTo Neoplasm of uncertain or unknown behaviour: Pluriglandular involvement semapv:UnspecifiedMatching
MONDO:0021645 esophageal varices with bleeding ICD10WHO:I85.0 MONDO:equivalentTo Oesophageal varices with bleeding semapv:UnspecifiedMatching
MONDO:0024318 viral infection of central nervous system ICD10WHO:A80-A89 MONDO:equivalentTo Viral infections of the central nervous system semapv:UnspecifiedMatching
File renamed without changes.
29 changes: 0 additions & 29 deletions src/ontology/lexmatch/mondo-only/unmapped_ordo_mondo.tsv

This file was deleted.

Loading

0 comments on commit c7d828a

Please sign in to comment.