Skip to content

Generating ontology terms using a pattern

Jim Balhoff edited this page Dec 1, 2017 · 30 revisions

The main use case for dosdp-tools (and the DOS-DP framework) is managing a set of ontology terms, which all follow a common logical pattern, by simply collecting the unique aspect of each term as a line in a spreadsheet. For example, we may be developing an ontology of environmental exposures. We would like to have terms in our ontology which represent exposure to a variety of stressors, such as chemicals, radiation, social stresses, etc.

Creating an ontology of environmental exposures

To maximize reuse and facilitate data integration, we can build our exposure concepts by referencing terms from domain-specific ontologies, such as the Chemical Entities of Biological Interest Ontology (ChEBI) for chemicals. By modeling each exposure concept in the same way, we can use a reasoner to leverage the chemical classification provided by ChEBI to provide a classification for our exposure concepts. Since each exposure concept has a logical definition based on our data model for exposure, there is no need to manually manage the classification hierarchy. Let's say our model for exposure concepts hold that an "exposure" is an event with a particular input (the thing the target is exposed to):

'exposure to X' EquivalentTo 'exposure event' and 'has input' some X

If we need an ontology class to represent 'exposure to sarin' (bad news!), we can simply use the term sarin from ChEBI, and create a logical definition:

'exposure to sarin' EquivalentTo 'exposure event' and 'has input' some sarin

We can go ahead and create some other concepts we need for our exposure data:

'exposure to asbestos' EquivalentTo 'exposure event' and 'has input' some asbestos

'exposure to chemical substance' EquivalentTo 'exposure event' and 'has input' some 'chemical substance'

These definitions again can reference terms provided by ChEBI: asbestos and chemical substance

Classifying our concepts

Since the three concepts we've created all follow the same logical model, their hierarchical relationship can be logically determined by the relationships of the chemicals they reference. ChEBI asserts this structure for those terms:

'chemical substance'
         |
         |
   --------------
  |              |
  |              |
sarin        asbestos

Based on this, an OWL reasoner can atomically tell us the relationships between our exposure concepts:

        'exposure to chemical substance'
                       |
                       |
           --------------------------
          |                          |
          |                          |
'exposure to sarin'        'exposure to asbestos'

To support this, we simply need to declare the ChEBI OWL file as an owl:import in our exposure ontology, and use an OWL reasoner such as ELK.

Managing terms with dosdp-tools

Creating terms by hand like we just did works fine, and relying on the reasoner for the classification will save us a lot of trouble and maintain correctness as our ontology grows. But since all the terms use the same logical pattern, it would be nice to keep this in one place; this will help make sure we always follow the pattern correctly when we create new concepts. We really only need to store the list of inputs (e.g. chemicals) in order to create all our exposure concepts. As we will see later, we may also want to manage separate sets of terms that follow other patterns. To do this with dosdp-tools, we need three main files: a pattern template, a spreadsheet of pattern fillers, and a source ontology.

For our chemical exposures, getting the source ontology is easy: just download chebi.owl. Note—it's about 450 MB.

For our pattern fillers spreadsheet, we just need to make a tab-delimited file containing the chemical stressors for which we need exposure concepts. The file needs a column for the term IRI to be used for the generated class (this column is always called defined_class), and also a column for the chemical to reference (choose a label according to your data model). It should look like this:

defined_class input
EXPOSO:1      CHEBI:75701
EXPOSO:2      CHEBI:46661
EXPOSO:3      CHEBI:59999

The columns should be tab-separated—you can download a correctly formatted file TODO to follow along.

Clone this wiki locally