Triplexer

The Triplexer is the computational pipeline that builds the backend database of the TriplexRNA: a database of cooperative microRNAs and their mutual targets.
The pipeline is based on the work of Lai et al. and Schmitz et al., and extended to cover multiple organisms and prediction algorithms.

Installation requirements
Operations
Run the Triplexer
- Examples

Installation requirements

The only requirement is Docker, which can be installed in different ways depending on the underlying operative system:

Unix users should follow the Docker installation for Linux, and install both Docker and Docker compose
MacOS 10.13+ users should follow the Docker installation for Mac
Windows 10+ users, should follow the Docker installation for Windows
For legacy systems, users can rely on the Docker Toolbox.

▲ back to top

Operations

The Triplexer defines three operations: read, filtrate, and annotate; each of which is referred to a namespace, i.e. a resource (file, database, etc.) that describes the RNA duplexes of a specific organism.
Namespaces are used to capture the provenance of a predicted RNA duplex, and subsequently keep the identification of putative RNA triplexes consistent across different organisms and genome releases.

▲ back to top

Read duplexes

This operation parses a file (or queries a database) containing the attributes of a set of organism-specific RNA duplexes, and stores their attributes in the underlying Redis cache as a set of hashes.
Since each namespace defines its own data structures, identifiers, and granularity of data, this operation is likely to be redefined by each namespace. However, output data structures share a common schema regardless of their namespace of origin. For instance, each RNA duplex is identified by the unique string:

<namespace label>:<dataset release>:<organism>:<genome build>:target:<target id>

#For more information about a namespace-specific read implementation, please #refer to the IMPLEMENTATIONS.md.

▲ back to top

Filtrate duplexes

Experimental findings suggest that RNA triplexes form when two cooperating miRNAs bind a common target gene with a seed site distance between 13 and 35 nucleotides (Saetrom et al. 2007). This means that duplex pairs that share a common target must be tested for complying with the aforementioned seed site distance. constraint.
Filtrate relies on the read operation (see above). It compares all the cached duplexes that share a common target gene, and keeps those pairs that comply with the seed site distance constraint. This operation is namespace agnostic. Its behavior can be summarized by the following pseudo-code:

for each target in the set of targets:
    for each duplex in the set of the target's duplexes:
        if duplex pair has miRNA alignment within binding range constraint:
            cache the target
            cache the duplex pair

▲ back to top

Annotate duplexes

The in-silico testing of a putative RNA triplex's structural stability can only be performed when the nucleotide sequences of both target gene's transcript and miRNA pair are given. However, not all dataset provide this information. For this reason, the annotate operation retrieves the genomic sequence of a duplex's target gene from the UCSC, and caches the transcript sequence for later stability testing.

▲ back to top

Run the Triplexer

To run the Triplexer pipeline, you need to run the Triplexer docker container and all containers it relies on. This is done via docker compose. Type:

docker-compose run triplexer

You can now launch the Triplexer pipeline. Try it with no arguments to overview its command line options:

$ triplexer
usage: triplexer [-h] [-v] [-c CONF] [-e EXE] [-d DB] [-r] [-f] [-a] [-n NS]

Predict and simulate putative RNA triplexes.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         print the version and exit
  -c CONF, --conf CONF  set CONF as configuration file
  -e EXE, --exe EXE     set EXE as number of parallely executing processes
  -d DB, --db DB        set DB as intermediate results database

operations (require -n):
  -r, --read            read the provided dataset in memory
  -f, --filtrate        filter entries not forming putative triplexes
  -a, --annotate        annotate transcripts with their sequences

namespace:
  -n NS, --ns NS        set NS as model organism namespace
                        supported NS (default "test"):
                        +-------+----------------------------------+
                        |  NS   | database:version:organism:genome |
                        +-------+----------------------------------+
                        |  test | microrna.org:aug.2010:hsa:hg19   |
                        |  1    | microrna.org:aug.2010:hsa:hg19   |
                        |  2    | microrna.org:aug.2010:mmu:mm9    |
                        |  3    | microrna.org:aug.2010:rno:rn4    |
                        |  4    | microrna.org:aug.2010:dme:dm3    |
                        +-------+----------------------------------+

▲ back to top

Examples

Read, filtrate and annotate duplexes rely on one another. It is therefore good practice to run them in this order, or at least make sure that the underlying cache can be of use when running one operation in isolation.

Here are some examples on how to fill the underlying cache with duplexes from the microrna.org namespace.

Read microrna.org's Human hg19 target site predictions:

triplexer -n 1 -r

Filtrate all microrna.org's Human hg19 duplexes by keeping those whose miRNA pairs bind a common target gene within the allowed distance range. Do so using 4 parallel processes:

triplexer -e 4 -n 1 -f

Annotate all microrna.org's Human hg19 duplexes with the transcript sequence of their target genes. Do so using 2 parallel processes:

triplexer -n 1 -a

Perform all aforementioned operations in one run. Do so using 4 parallel processes:

triplexer -e 4 -n 1 -r -f -a

▲ back to top

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
data		data
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
cli.py		cli.py
common.py		common.py
conf.yaml		conf.yaml
docker-compose.yaml		docker-compose.yaml
microrna_org.py		microrna_org.py
triplexer		triplexer
ucsc.py		ucsc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triplexer

Installation requirements

Operations

Read duplexes

Filtrate duplexes

Annotate duplexes

Run the Triplexer

Examples

About

Releases

Packages

Languages

sbi-rostock/triplexer

Folders and files

Latest commit

History

Repository files navigation

Triplexer

Installation requirements

Operations

Read duplexes

Filtrate duplexes

Annotate duplexes

Run the Triplexer

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages