-
Notifications
You must be signed in to change notification settings - Fork 18
Relational algebra Genome Intervals
In order to get precise answers to biological questions, precise definitions of the relations between biological entities are required. These can be specified using First-Order Logic (FOL) axioms. The benefit of this system is that relations can be inferred and established by a theorem solver/program. From an API perspective they are useful for thinking about what functions should act on genomic data structures.
Example of relationships in Sequence Ontology
The relations are defined to hold between two continuous regions (this can be any abstract notion of space i.e. Euclidean space) - actually used in geographical information systems.
DC - disconnected
EC - externally connected
EQ - equal
PO - partially overlapping
TPP - tangential proper part
TPPi - tangential proper part inverse
NTPP - non-tangential proper part
NTPPi - non-tangential proper part inverse
Originally defined for reasoning about temporal intervals and like RCC-8 assumes continous intervals, Allan's interval algebra has been implemented in the Bioconductor package GenomicRanges (seeing
?GenomicRanges::findOverlaps
for details.
There are 13 relationships, 6 of which are inverses
- precedes (p)
- meets (m)
- overlaps (o)
- finishes (f) [equivalently called ends]
- starts (s)
- equals (e)
- containedBy (c) [equivalently called during]
- precededBy (pi) [inverse of precedes]
- metBy (mi) [inverse of meets]
- overlappedBy (oi) [inverse of overlaps]
- finishedBy (fi) [inverse of finishes]
- startedBy (si) [inverse of starts]
The algebra is also equipped with intersection, union and complement operators.
There are sixteen relations in Mungall's algebra that extend and modify Allen's algebra:
The algebra is equipped with intersection, union and complement operators but is not isomorphic to Allan's algebra, nor is the set of relations pairwise disjoint. One advantage is this algebra can be defined to act on intervals (and take into account strandedness for a total of 32 relations, using the reverse complement (RC) relation) or defined to act on junctions.
Junctions are defined in terms of point-positions: a proper junction "is a discrete point connecting two nucleotide bases." and junctions are formed by unions of proper junctions and the boundary points of a sequence. In the junction defintion of the algebra two functions are defined, α and ω which map the start and end points of an interval to a junction, respectively. An other relation operator succeeds, abreviated to succ, which acts between two junctions seperated by a base and signifies that first junction is at the 5' end, while the second is at the 3' end. The inequality operators are overloaded so they are transitiive. The functions α and ω can be eliminated by translating them to the interval operator.
In it's final form the algebra consists of three sets of relations: R: the orginal set, R' the stranded set via the RC operator, and the union set R^U = R union R', for a total of 48 relations.
Usually you do not work with an isolated interval but a collection of them - the algebra introduces the relation interleaves as distinct from overlaps (i.e. exons can interleave on opposite strands even if they do not overlap).