Skip to content

Harmonisation with the (Translator) Biolink Model ontology standards

Richard Bruskiewich edited this page Apr 13, 2018 · 7 revisions

Back

Objective

To harmonize the RKB concept type and predicate (edge) labels to the (Translator) Biolink Model ontology standards.

Approach

The semantics of concepts in this database was simply tagged using the UMLS Metamap "Semantic Group" tags, not the fine grained UMLS semantic types.

For legacy reasons, this semantic group tagging of the Concept nodes in the database have two fields with identical semantic grou: type and semanticGroup. Aside from normalizing and renaming these fields to the single "category" (as per the new TKG data model), we will substitute the Biolink Model concept type for each Semantic Group, as per the following mapping. The current RKB lacks the original fine grained UMLS semantic types. Were they present, then they could perhaps be captured by the proposed secondary "TYPE" edge to Concept Type annotation.

UMLS to Biolink Model Concept Type Mapping

The original RKB used the simple labels from the UMLS MetaMap Semantic Groups. A direct string replacement (using Cypher) is applied to convert these to the Biolink Model concept types. The mapping applied is as follows:

Code Description Biolink Term
OBJC Objects named thing
ACTI Activities & Behaviors activity and behaviour
ANAT Anatomy anatomical entity
CHEM Chemicals & Drugs chemical substance
CONC Concepts & Ideas information content entity
DEVI Devices device
DISO Disorders disease
GENE Genes & Molecular Sequences genomic entity
GEOG Geographic Areas geographic location
LIVB Living Beings organismal entity
OCCU(*) Occupations named thing
ORGA Organizations administrative entity
PHEN Phenomena phenomenon
PHYS Physiology physiology
PROC Procedure procedure

(*) not directly tracked in Biolink - concepts & related statements may be removed?

UMLS to Biolink Model Predicate Mapping

The RKB Predicate nodes are tagged with Wikidata P# property identifiers (base URI "https://www.wikidata.org/wiki/Property:") in their 'accessionId' property field. These predicate values were mapped onto the Semantic Medline Database records somehow. These need to be rewritten to the corresponding Translator Biolink Model predicates terms, as in the table below:

Property Id Name Biolink Term
wd:P3356 positive diagnostic predictor
wd:P129 physically interacts with (in molecular biology) molecularly interacts with
wd:P279 subclass of subclass of
wd:P276 location
wd:P1557 manifestation of biological process
wd:P361 part of part of
wd:P156 followed by
wd:P1056 product
wd:P2888 exact match
wd:P2175 medical condition treated
wd:P2283 uses
wd:P1542 cause of
kb:P2176 drug used for treatment
wd:P703 found in taxon
wd:P688 encodes
wd:P684 ortholog molecularly interacts with
wd:P682 biological process biological process
wd:P681 cell component cellular component
wd:P680 molecular function
wd:P3433 biological variant of sequence variant(?)
wd:P2293 genetic association gene to gene association
wd:P1552 has quality
wd:P128 regulates (molecular biology) regulates, entity to entity