Skip to content

A computational pipeline that builds the backend database of the TriplexRNA: a database of cooperative microRNAs and their mutual targets.

Notifications You must be signed in to change notification settings

sbi-rostock/triplexer

Repository files navigation

Docker Repository on Quay

Triplexer

The Triplexer is the computational pipeline that builds the backend database of the TriplexRNA: a database of cooperative microRNAs and their mutual targets.
The pipeline is based on the work of Lai et al. and Schmitz et al., and extended to cover multiple organisms and prediction algorithms.

Installation requirements

The only requirement is Docker, which can be installed in different ways depending on the underlying operative system:

▲ back to top

Operations

The Triplexer defines three operations: read, filtrate, and annotate; each of which is referred to a namespace, i.e. a resource (file, database, etc.) that describes the RNA duplexes of a specific organism.
Namespaces are used to capture the provenance of a predicted RNA duplex, and subsequently keep the identification of putative RNA triplexes consistent across different organisms and genome releases.

▲ back to top

Read duplexes

This operation parses a file (or queries a database) containing the attributes of a set of organism-specific RNA duplexes, and stores their attributes in the underlying Redis cache as a set of hashes.
Since each namespace defines its own data structures, identifiers, and granularity of data, this operation is likely to be redefined by each namespace. However, output data structures share a common schema regardless of their namespace of origin. For instance, each RNA duplex is identified by the unique string:

<namespace label>:<dataset release>:<organism>:<genome build>:target:<target id>

#For more information about a namespace-specific read implementation, please #refer to the IMPLEMENTATIONS.md.

▲ back to top

Filtrate duplexes

Experimental findings suggest that RNA triplexes form when two cooperating miRNAs bind a common target gene with a seed site distance between 13 and 35 nucleotides (Saetrom et al. 2007). This means that duplex pairs that share a common target must be tested for complying with the aforementioned seed site distance. constraint.
Filtrate relies on the read operation (see above). It compares all the cached duplexes that share a common target gene, and keeps those pairs that comply with the seed site distance constraint. This operation is namespace agnostic. Its behavior can be summarized by the following pseudo-code:

for each target in the set of targets:
    for each duplex in the set of the target's duplexes:
        if duplex pair has miRNA alignment within binding range constraint:
            cache the target
            cache the duplex pair

▲ back to top

Annotate duplexes

The in-silico testing of a putative RNA triplex's structural stability can only be performed when the nucleotide sequences of both target gene's transcript and miRNA pair are given. However, not all dataset provide this information. For this reason, the annotate operation retrieves the genomic sequence of a duplex's target gene from the UCSC, and caches the transcript sequence for later stability testing.

▲ back to top

Run the Triplexer

To run the Triplexer pipeline, you need to run the Triplexer docker container and all containers it relies on. This is done via docker compose. Type:

docker-compose run triplexer

You can now launch the Triplexer pipeline. Try it with no arguments to overview its command line options:

$ triplexer
usage: triplexer [-h] [-v] [-c CONF] [-e EXE] [-d DB] [-r] [-f] [-a] [-n NS]

Predict and simulate putative RNA triplexes.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         print the version and exit
  -c CONF, --conf CONF  set CONF as configuration file
  -e EXE, --exe EXE     set EXE as number of parallely executing processes
  -d DB, --db DB        set DB as intermediate results database

operations (require -n):
  -r, --read            read the provided dataset in memory
  -f, --filtrate        filter entries not forming putative triplexes
  -a, --annotate        annotate transcripts with their sequences

namespace:
  -n NS, --ns NS        set NS as model organism namespace
                        supported NS (default "test"):
                        +-------+----------------------------------+
                        |  NS   | database:version:organism:genome |
                        +-------+----------------------------------+
                        |  test | microrna.org:aug.2010:hsa:hg19   |
                        |  1    | microrna.org:aug.2010:hsa:hg19   |
                        |  2    | microrna.org:aug.2010:mmu:mm9    |
                        |  3    | microrna.org:aug.2010:rno:rn4    |
                        |  4    | microrna.org:aug.2010:dme:dm3    |
                        +-------+----------------------------------+

▲ back to top

Examples

Read, filtrate and annotate duplexes rely on one another. It is therefore good practice to run them in this order, or at least make sure that the underlying cache can be of use when running one operation in isolation.

Here are some examples on how to fill the underlying cache with duplexes from the microrna.org namespace.

triplexer -n 1 -r
  • Filtrate all microrna.org's Human hg19 duplexes by keeping those whose miRNA pairs bind a common target gene within the allowed distance range. Do so using 4 parallel processes:
triplexer -e 4 -n 1 -f
  • Annotate all microrna.org's Human hg19 duplexes with the transcript sequence of their target genes. Do so using 2 parallel processes:
triplexer -n 1 -a
  • Perform all aforementioned operations in one run. Do so using 4 parallel processes:
triplexer -e 4 -n 1 -r -f -a

▲ back to top

About

A computational pipeline that builds the backend database of the TriplexRNA: a database of cooperative microRNAs and their mutual targets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published