Skip to content

Latest commit

 

History

History
639 lines (472 loc) · 21.1 KB

using-the-modeling-language.md

File metadata and controls

639 lines (472 loc) · 21.1 KB
title parent layout nav_order
Using the Modeling Language
Biolink Model Guidelines
default
3

Using the Modeling Language

Overview

The linkML provides a variety of slots to define the semantics of your Biolink Model class and slots.

This document tries to address on how to use most of these slots in Biolink Model.

Please refer to linkML documentation for an exhaustive list of slots provided by the modeling language.

Inheritance Related Slots

is_a

The is_a slot can be used to define a hierarchy for your Biolink Model class, mixin or slot where a new class, mixin or slot is defined as a subclass of another defined class, mixin or slot.

gene:
  is_a: gene or gene product

Here we define that the entity class gene is a sub-class of gene or gene product. Note that is_a has the characteristics of homeomorphicity: is_a SHOULD only connect either (1) two mixins (2) two classes (3) two slots.

abstract

A model class (or slot) may be tagged with its abstract slot set to the boolean value true, to define whether it is abstract. This has comparable meaning to that in the computing science Object Oriented Paradigm: another class (or slot) can use the abstract class (or slot) as part of its inheritance hierarchy, but the abstract class itself cannot be directly instantiated.

  cell line to thing association:
    is_a: association
    defining_slots:
      - subject
    abstract: true
    description: >-
      A relationship between a cell line and another entity
    slot_usage:
      subject:
        range: cell line

Here we define that the association class cell line to thing association is an abstract class. In this case, the class simply constrains its child subclasses to have a subject range of biolink:CellLine.

mixin

The mixin:true demarkation is used to extend the properties (or slots) of a class, without changing its position in the class hierarchy. Mixins can be extremely helpful in a number of ways: 1) to generalize a set of attributes that can apply to classes in different parts of the class hierarchy, 2) reduce duplication of shared attributes between classes that do not inherit from one another and 3) to prevent the sometimes confusing nature of multiple inheritance noted in the '[diamond problem]'(https://tinyurl.com/4zdw9tsb).

In general, while mixin slots and classes should not be directly instantiated, or used directly as a slot in a class, KGs can use them as a substitute for multiple inheritance. For example, a KG might wish to determine what are the parents of a certain class. In this case, the KG should navigate a mixin used in a domain or range of a class or slot, as it would the "is_a" demarkation.

  thing with taxon:
    mixin: true
    description: >-
      A mixin that can be used on any entity with a taxon
    slots:
      - in taxon

Here we define the class thing with taxon as a mixin class with a slot in taxon.

  molecular entity:
    is_a: biological entity
    mixins:
      - thing with taxon
      - physical essence
      - ontology class
    aliases: ['bioentity']

In the class molecular entity, we use the thing with taxon mixin in order to add the 'in taxon' attribute (slot) to the molecular entity class. The other way to do this would be to duplicate the 'in taxon' attribute in every class manually (duplicative), or have hierarchy/parent that had the in taxon slot (but this parent would be a sister-class to 'named thing' as not all named-things are taxon based). Mixins simplify modeling and should be used when necessary.

  regulates:
    is_a: affects
    comments:
      - This is a grouping for process-process and entity-entity relations
    mixin: true

Here we define the slot regulates as a mixin slot. This slot can be used as a mixin by other slots.

Mixins can also be hierarchical. For example:

  frequency qualifier mixin:
    mixin: true
    description: >-
      Qualifier for frequency type associations
    slots:
      - frequency qualifier

Here we define the mixin frequency qualifier mixin to hold the parent slot, frequency qualifier.
The slot, frequency qualifier is then inherited by every class in the subsequent is_a hierarchy of entity to feature or disease qualifiers mixin. The frequency quantifier mixin was created with similar intentions (favoring consistency in modeling similar domains), though its reuse is not as evident in the model yet.

Mixins provide the means of reusing semantics, generally by the inclusion of specific property slots or other semantic constraint, in different classes or slots, without the need to tie slots to the hierarchy of the class itself.

positively regulates:
    comments:
      - This is a grouping for positive process-process and entity-entity regulation.
    is_a: regulates
    inverse: positively regulated by
    mixin: true
    close_mappings:
      # This RTX contributed term is tagged as a inverse of this Biolink predicate
      - RO:0002336
    exact_mappings:
      - RO:0002213
    narrow_mappings:
      - CHEMBL.MECHANISM:activator
      - DGIdb:activator
      - RO:0004032
      - RO:0004034
      - RO:0002629
      - SEMMEDDB:augments

positively regulates is another example of a mixin. In this case, a mixin is used to store meta data about a predicate or relationship between two entities at a general level. Its subsequent children, inherit these definitions and attributes, whether or not the parent mixin class has any slots.

mixins

The mixins slot can be used to specify a list of mixins that a class (or slot) can use to include the added semantics of the mixins.

The mixins are separate from the is_a hierarchy and the mixin classes do not contribute to a classes inheritance hierarchy.

  individual organism:
    is_a: organismal entity
    mixins:
      - thing with taxon

Here we define an entity class individual organism that uses the mixin class thing with taxon. By virtue of the mixin, the class individual organism will have an in taxon slot in addition to all its own slots, its parent slots, and its ancestor slots.

Identification, Descriptive and Indexing Related Slots

aliases

The aliases slot can be used to define a list of aliases for a Biolink Model class (or slot). This is useful for adding synonymous names to your class (or slot).

gene:
  is_a: gene or gene product
  aliases:
    - locus

Here we define that the entity class gene has an alias locus.

description

The description slot can be used to provide a human-readable description of a class (or slot).

  genetically interacts with:
    is_a: interacts with
    description: >-
      holds between two genes whose phenotypic effects are dependent on each other in some way - such that their combined phenotypic effects are the result of some interaction between the activity of their gene products. Examples include epistasis and synthetic lethality.
    domain: gene
    range: gene

Here we define a human readable description that describes the predicate slot genetically interacts with and its purpose.

slot_uri

The slot_uri slot can be used to define a canonical URI that is the true representation for that particular slot. That is, the value of slot_uri can be used interchangeably with the slot being defined.

  name:
    is_a: node property
    aliases: ['label', 'display name']
    domain: named thing
    range: label type
    slot_uri: rdfs:label

Here we define rdfs:label as the canonical URI for the property slot name. When serializing a graph into RDF, the name of an instance of entity class named thing will be represented using rdfs:label instead of biolink:name.

This is to ensure that we use certain core RDF predicates as is.

in_subset

The in_subset slot can be used tag your class (or slot) to belong to a pre-defined subset.

The actual subset names are defined as part of the Schema definition.

  genetically interacts with:
    is_a: interacts with
    domain: gene
    range: gene
    in_subset:
      - translator_minimal

Here we define the predicate slot genetically interacts with as part of the translator_minimal subset.

id_prefixes

The id_prefixes slot can be used to define a list of valid ID prefixes that instances of this class ought to have as part of their CURIE.

The order of the list matters since its a prioritized list with the ID prefix with the highest priority appearing at the top of the list.

  gene:
    is_a: gene or gene product
    aliases: ['locus']
    slots:
      - id
      - name
      - symbol
      - description
      - synonym
      - xref
    mappings:
      - SO:0000704
      - SIO:010035
      - WIKIDATA:Q7187
    id_prefixes:
      - NCBIGene
      - ENSEMBL
      - HGNC
      - UniProtKB
      - MGI
      - ZFIN
      - dictyBase
      - WB
      - WormBase
      - FlyBase
      - FB
      - RGD
      - SGD
      - PomBase

Here we define the entity class gene to have a list of ID prefixes with NCBIGene having the highest priority.

Slots Relating to Class Composition

slots

The slot property list enumerates the names of slots which a given class, mixin or its subclasses are generally permitted to have. Unless it is designated as one of the defining_slots (see below) or slot_usage (see below) specifies that a given slot is required: true (see below), then it is not mandatory that such a slot is instantiated in all instances of the given class, mixin or subclass inheriting it.

defining_slots

The defining_slots slot can be used to specify which slots of an instance are necessary for defining an instance as a member of a class.

  gene to gene association:
    is_a: association
    defining_slots:
      - subject
      - object

Here we specify that an association can be determined to be an instance of class gene to gene association based on the semantics of two of its slots: subject and object.

i.e. One can infer an association to be an instance of gene to gene association if both its subject and its object are an instances of class gene.

Listing a slot as one of the defining_slots slots effectively makes it required: true (see below).

slot_usage

The slot_usage slot can be used to specify how a particular slot ought to be used in a class.

This is useful for documenting what a particular slot means for instances of a particular class.

  gene to gene association:
    aliases: ['molecular or genetic interaction']
    is_a: association
    defining_slots:
      - subject
      - object
    description: >-
      abstract parent class for different kinds of gene-gene or gene product to gene product relationships.
      Includes homology and interaction.
    slot_usage:
      subject:
        range: gene or gene product
        description: >-
          the subject gene in the association. If the relation is symmetric, subject vs object is arbitrary.
          We allow a gene product to stand as proxy for the gene or vice versa
      object:
        range: gene or gene product
        description: >-
          the object gene in the association. If the relation is symmetric, subject vs object is arbitrary.
          We allow a gene product to stand as proxy for the gene or vice versa

Here we document the association class gene to gene association with information on how the slot subject and object ought to be used to represent this association properly.

In the slot_usage section we define the range and provide a description for the slot subject and object.

required

The required slot can be used to define whether a slot is required.

When a slot is declared as required, any class that uses that slot must have a value for that slot.

  id:
    is_a: node property
    required: true
    domain: named thing
    mappings:
      - alliancegenome:primaryId
      - gff3:ID
      - gpi:DB_Object_ID

Here we define the property slot id as a required field for all instances of the entity class named thing.

Slots Relating to Constraints on Slot Composition

domain

The domain slot mimics the idea of rdfs:domain where you constrain the type of classes that a given Biolink Model slot can be a part of.

  genetically interacts with:
    is_a: interacts with
    domain: gene

Here we define that the subject (source node) of the predicate slot genetically interacts with must be an instance of class gene.

range

The range slot mimics the idea of rdfs:range where you can constrain the type of classes (or data types) a given Biolink Model slot can have as its value.

  genetically interacts with:
    is_a: interacts with
    domain: gene
    range: gene

Here we define that both the subject (source node) and object (target node) of the predicate slot genetically interacts with must be instances of class gene.

symmetric

The symmetric slot can be used to specify whether a Biolink Model predicate slot is symmetric in its semantics.

i.e. if A -[r]-> B and r is symmetric then one can infer B -[r]-> A

  genetically interacts with:
    is_a: interacts with
    domain: gene
    range: gene
    in_subset:
      - translator_minimal
    symmetric: true

Here we define that the predicate slot genetically interacts with is symmetric.

Note: This property is not inherited by descendants of this predicate slot. You will have to explicitly define every predicate slot that should be considered as symmetric.

Slots Relating Semantic Mappings and Anchoring to External Ontology

symmetric

The symmetric slot can be used to specify whether a given predicate slot is symmetric.

  interacts with:
    domain: named thing
    range: named thing
    description: >-
      holds between any two entities that directly or indirectly interact with each other
    is_a: related to
    in_subset:
      - translator_minimal
    symmetric: true

Note: The symmetric nature of the predicate is not inherited by descendants of the predicate.

inverse

The inverse slot can be used to specify the inverse predicate of a given predicate slot relationship.

  affects:
    is_a: related to
    description: >-
      describes an entity that has a direct affect on the state or quality
      of another existing entity. Use of the 'affects' predicate implies that
      the affected entity already exists, unlike predicates such as
      'affects risk for' and 'prevents, where the outcome is something
      that may or may not come to be.
    inverse: affected by
    in_subset:
      - translator_minimal

exact_mappings

The exact_mappings slot can be used to define external concepts, predicates, or properties which are considered to be exact mappings to the class (or slot) being defined.

  same as:
    is_a: exact match
    description: >-
      holds between two entities that are considered equivalent to each other
    in_subset:
      - translator_minimal
    exact_mappings:
      - owl:sameAs
      - skos:exactMatch
      - WIKIDATA_PROPERTY:P2888
      - CHEMBL.MECHANISM:equivalent_to
      - MONDO:equivalentTo

Here we define a list of 5 predicates that are semantically equivalent to the Biolink Model predicate slot same as.

close_mappings

The close_mappings slot can be used to define external concepts, predicates, or properties which are considered to be close mappings to the class (or slot) being defined.

  same as:
    is_a: exact match
    description: >-
      holds between two entities that are considered equivalent to each other
    in_subset:
      - translator_minimal
    exact_mappings:
      - owl:sameAs
      - skos:exactMatch
      - WIKIDATA_PROPERTY:P2888
      - CHEMBL.MECHANISM:equivalent_to
      - MONDO:equivalentTo
    close_mappings:
      - owl:equivalentClass

Here we define owl:equivalentClass as being a close match to the Biolink Model predicate slot same as.

narrow_mappings

The narrow_mappings slot can be used to define external concepts, predicates, or properties which are considered to be narrow mappings to the class (or slot) being defined.

  same as:
    is_a: exact match
    description: >-
      holds between two entities that are considered equivalent to each other
    in_subset:
      - translator_minimal
    close_mappings:
      - owl:equivalentClass
    exact_mappings:
      - owl:sameAs
      - skos:exactMatch
      - WIKIDATA_PROPERTY:P2888
      - CHEMBL.MECHANISM:equivalent_to
      - MONDO:equivalentTo
    narrow_mappings:
      - DRUGBANK:external-identifier

Here we define DRUGBANK:external-identifier as being a narrow match to the predicate slot same as.

By narrow we mean that the scope of DRUGBANK:external-identifier is more narrower and restrictive than same as.

If we were to create a new predicate slot as a proxy for DRUGBANK:external-identifier then that new slot would be a child of same as.

broad_mappings

The broad_mappings slot can be used to define external concepts, predicates, or properties which are considered to be broad mappings to the class (or slot) being defined.

  in complex with:
    description: >-
      holds between two genes or gene products that are part of (or code for products that are part of) in the same macromolecular complex
    is_a: coexists with
    domain: gene or gene product
    range: gene or gene product
    in_subset:
      - translator_minimal
    broad_mappings:
      - SIO:010285

Here we define SIO:010285 (molecular complex formation) as a broad mapping to the predicate slot in complex with.

By broad we mean that the scope of SIO:010285 is more broad and relaxed than in complex with.

If we were to create a new predicate slot as a proxy for SIO:010285 then that new slot would be a parent of in complex with.

related_mappings

The related_mappings slot can be used to define external concepts, predicates, or properties which are considered to be related mappings to the class (or slot) being defined.

  in complex with:
    description: >-
      holds between two genes or gene products that are part of (or code for products that are part of) in the same macromolecular complex
    is_a: coexists with
    domain: gene or gene product
    range: gene or gene product
    in_subset:
      - translator_minimal
    broad_mappings:
      - SIO:010285
    related_mappings:
      - SIO:010497

Here we define SIO:010497 (protein complex) as a related mapping to the predicate slot in complex with.

By related we mean that the scope of SIO:010497 is related to the predicate slot in complex with and it's difficult to infer any further granularity.

subproperty_of

The subproperty_of slot can be used (typically, under slot_usage) to anchor the values of a Biolink predicate slot of an association to a particular predicate (and its subclasses) other than the top-most predicate, biolink:related_to.

  gene to gene homology association:
    is_a: gene to gene association
    slot_usage:
      predicate:
        subproperty_of: homologous to

Here, the predicate of the biolink:GeneToGeneHomologyAssociation is constrained to a value the subtree of predicates of biolink:homologous_to or its subclasses.

subclass_of

The subclass_of slot can be used to anchor the semantics of a Biolink class to a particular term in an external 3rd party ontology.

  named thing:
    description: "a databased entity or concept/class"
    slots:
      - id
      - name
      - category
    subclass_of: BFO:0000001

Here, biolink:NamedThing is anchored to ontology term BFO:0000001 - Entity of the Basic Formal Ontology which implies all is_a specified subclasses of biolink:NamedThing are also a subclasses of BFO:0000001.