Skip to content

Latest commit

 

History

History
87 lines (66 loc) · 2.41 KB

README.md

File metadata and controls

87 lines (66 loc) · 2.41 KB

Prediction of Trypanosomatid Regulatory Elements

Overview

Features

  • Sequence motifs
    • 5' UTR
    • 3' UTR
    • Upstream gene's 3' UTR
    • Downstream genes 5' UTR
    • Upstream intergenic region
    • Downstream intergenic region
    • CDS
  • Sequence composition
    • 5' UTR GC/CT composition
    • 3' UTR GC/CT composition
    • CDS GC/CT composition
    • Polypyrimidine tract GC/CT composition
    • Kmer counts
  • Sequence lengths
    • 5' UTR length
    • 3' UTR length
    • Polypyrimidine tract length
    • Interenic region / inter-CDS length
  • Other
    • CDS codon adaptation index (CAI)

Installation

The Trypanosomatid Regulatory Elements prediction pipeline makes use a number of different R and Python packages, as well as several standalone tools.

Below is a list of all of the requirements needed to run this pipeline.

Requirements

Software requirements

Python requirements

R requirements

conda create -n reg-predict --file requirements.txt \
    --channel bioconda \
    --channel conda-forge \
    --channel pytorch

Note: In order to avoid running out of memory during execution, the hierarchical clustering portion of the EXTREME script run_consensus_clusering_using_wm.pl may need to be edited to increase the value Xmx, e.g.: -Xmx10000m.

Usage

TODO: describe software for predicting UTR boundaries, etc.

snakemake --configfile settings/config.yml combine_motifs