RiboProAnalysis

RiboProAnalysis is a pipeline for Ribosome Profiling analysis for all eukaryotic genome from Ensembl 75+. It performs pre-processing steps (quality control, filtering, trimming and size selection), reads mapping to rRNA and reference genome, counting on CDS for each gene and differential analysis from raw Ribosome Profiling data.

##Use : RiboProAnalysis can be used via a Docker image (URL) and a standard Bash script with several cases : it can performs demultiplexing on multiplexed FASTQ (reads MUST begin with the index sequence) and use of RNA-seq counts to give a study of the mode of regulation of the translation. If you use FASTQ files (no demultiplexing), the extension have to be .fastq

A configuration file .conf is mandatory to laucun the program. If there is no use of RNA-seq counts, a tabulated design file, target.txt, is needed. User have to build rRNA and genome index before start with the pipeline. If you have RNA-seq counts, files must be named : SAMPLENAME_mRNAcounts.txt

Build Bowtie1 index for rRNA sequences

bowtie-build rRNA.fasta rRNA

Build STAR index for reference genome

STAR --runMode genomeGenerate --genomeDir /path/to/genome/index --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 ... --sjdbGTFfile /path/to/gtf/annotations \
--sjdbOverhang 28

Create a tmp/ directory in your working directory with command mkdir tmp/

Run RiboProAnalysis container with following command in the working directory :

-v /path/to/rRNA/index:**/rRNAindexdirectory** -v /path/to/genome/index:**/genomeindexdirectory** -v /path/to/directory/containig/GTF/Ensembl/annotations:**/root -v $(pwd)/tmp:/tmp **\
**genomicpariscentre/riboproanalysis bash -c "riboproanalysisDocker.sh** My_configuration_file.conf"
```

Run RiboProAnalysis bash program with following command in the working directory :

riboproanalysis.sh MyConfigurationFile.conf


### Available variables to set in the configuration file

| Variables                          | Explanation                                                                                                                           | Choices/Examples                                                     | Default                                   |
|------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|-------------------------------------------|
| PATH_TO_GENOME_INDEX               | Absolute path to genome index previously built with STAR                                                                              | /absolute/path/to/genome/index/                                      | Mandatory (if not Docker mode)            |
| PATH_TO_rRNA_INDEX                 | Absolute path to rRNA index previously built with Bowtie1                                                                             | /absolute/path/to/rRNA/index                                         | Mandatory (if not Docker mode)            |
| PATH_TO_ANNOTATION_FILE            | Absolute path to GTF annotations file (Ensembl 75)                                                                                    | /absolute/path/to/gtf/annotations                                    | Mandatory                                 |
| USER_IDS                           | Result of the bash command : $(id -u):$(id -g)                                                                                        | UserId:GroupId                                                       | Mandatory (if Docker mode)                |
| SAMPLE_ARRAY                       | Array containing sample names (if demultiplexing) or FASTQ file for each sample                                                       | (Sample1 Sample 2 Sample 3) OR (Samp1.fastq Samp2.fastq Samp3.fastq) | Mandatory                                 |
| ADAPTER_SEQUENCE_THREE_PRIME       | Adapter sequence for 3' trimming                                                                                                      | AAAAAAAGGTCCTAA                                                      | Mandatory                                 |
| STRANDED                           | Answer for stranded option of HTSeq-Count                                                                                             | yes/no/reverse                                                       | Mandatory                                 |
| PATH_TO_RAW_UNDEMULTIPLEXED_FILE   | Absolute path to multiplexed FASTQ file                                                                                               | /absolute/path/to/multiplexed/fastq                                  | Mandatory for demultiplexing              |
| SAMPLE_INDEX_ARRAY                 | Array containing 5' index used for demultiplexing. Respect same order as in SAMPLE_ARRAY so index match with respective sample name   | (IndexSamp1 IndexSamp2 IndexSamp3)                                   | Mandatory for demultiplexing. Never empty |
| ANSWER_REMOVE_POLYN_READS          | Option to remove reads containing more than 2 N bases (cutadapt –max-n 2)                                                             | YES / NO                                                             | NO                                        |
| ANSWER_DEMULTIPLEXING              | Option to launch demultiplexing step                                                                                                  | YES / NO                                                             | NO                                        |
| ANSWER_REMOVE_PCR_DUPLICATES       | Option to launch PCR duplicates removing                                                                                              | YES / NO                                                             | NO                                        |
| ANSWER_RNASEQ_COUNTING             | Option to launch Babel OR SARTools for differential analysis                                                                          | YES / NO                                                             | NO                                        |
| ANSWER_KEEP_MULTIREAD              | Option to keep multi-reads in a distinct SAM file                                                                                     | YES / NO                                                             | NO                                        |
| DIFFERENTIAL_ANALYSIS_PACKAGE      | Choice of the R package launched by SARTools                                                                                          | DESEQ2 / EDGER                                                       | EDGER                                     |
| CONDITION_ARRAY                    | Array containig condition name of each sample respecting the same order                                                               | (Cond_Samp1 Cond_Samp2 Cond_Samp3)                                   | Mandatory with Babel                      |
| AUTHOR                             | Author's name                                                                                                                         | UserName                                                             | Mandatory for SARTools                    |
| REFERENCE_CONDITION                | Reference condition for the statistical analysis of SARTools                                                                          | WT                                                                   | Mandatory for SARTools                    |
| CHECK_DOCKER_IMAGES				 | Check the tags of Docker images																										 | YES / NO																| NO

##Installation :
This software could be launched from a Docker container launcheing Docker containers itself, or from a Bash script launchine Docker containers.

You should :
* Install Docker on your computer
* Pull following docker images from the Genomic Paris Centre Docker public repository : 
	* genomicpariscentre/fastqc:0.11.5
	* genomicpariscentre/cutadapt:1.8.3
	* genomicpariscentre/bowtie1:1.1.1
	* genomicpariscentre/star:2.5.1b
	* genomicpariscentre/gff3-ptools:0.4.0
	* genomicpariscentre/samtools:0.1.19
	* genomicpariscentre/htseq:0.6.1p1
	* genomicpariscentre/babel:0.3-0
	* genomicpariscentre/sartools:1.3.2

* Pull RiboProAnalysis image

##Input files :

###Configuration file :
You have to create your configuration file .conf in the working directory. It is a little Bash script which is imported in the main Bash script.
You put mandatory and interesting variables presented in **Available variables to set in the configuration file**.
The syntax to declare a variable is :

export VARIABLE_NAME=MyVariable


###Target file (if SARTools is used) :
The target file should include following columns : label, files, group
* **label** the sample label
* **files** the Ribosome Profiling counts files. Syntaxe if demultiplexing : SAMPLENAME_htseq.txt (ex : sample MB1 --> MB1_htseq.txt). Syntaxe if no demultiplexing : BASENAME_FASTQFILE_htseq.txt (ex : file MB1.toto.fastq --> MB1.toto_htseq.txt)
IF you use Babel without demultiplexing, you have to name your FASTQ file : SAMPLENAME.fastq --> your file after HTSeq-Count will be SAMPLENAME_RPcounts.txt and your count files for RNA-seq will must be SAMPLENAME_mRNAcounts.txt.
* **group** the group/condition given to a sample

####Model of configuration file for run with Bash script

export PATH_TO_RAW_UNDEMULTIPLEXED_FILE=/import/disir01/bioinfo/RiboPro/Riboprotma_project/2015_240_NoIndex_L008_R1_001.fastq export PATH_TO_GENOME_INDEX=/import/disir01/bioinfo/RiboPro/IndexAlignement/STAR/yeastGenomeEnsembl export PATH_TO_rRNA_INDEX=/import/disir01/bioinfo/RiboPro/IndexAlignement/Bowtie1/rRNALevureNCBI export PATH_TO_ANNOTATION_FILE=/import/rhodos01/shares-net/bioinfo/RiboPro/FichiersLevure/GenomeAnnotations_Ensembl/Saccharomyces_cerevisiae.R64-1-1.75.gtf export ANSWER_DEMULTIPLEXING=YES export ANSWER_REMOVE_PCR_DUPLICATES=YES export ANSWER_RNASEQ_COUNTING=NO export DIFFERENTIAL_ANALYSIS_PACKAGE=EDGER export SAMPLE_ARRAY=(RT1 RT11 RT7 RT13) export SAMPLE_INDEX_ARRAY=(NNNGGTTNN NNNAACCNN NNNTTAGNN NNNCGGANN) export ADAPTER_SEQUENCE_THREE_PRIME=AGATCGGAAGAGCGGTTCAG export STRANDED=yes export AUTHOR=User export REFERENCE_CONDITION=WildType


####Model of configuration file for run with Docker container

export USER_IDS=2747:100 export PATH_TO_RAW_UNDEMULTIPLEXED_FILE=$(pwd)/2015_240_NoIndex_L008_R1_001.fastq export PATH_TO_ANNOTATION_FILE=/import/rhodos01/shares-net/bioinfo/RiboPro/FichiersLevure/GenomeAnnotations_Ensembl/Saccharomyces_cerevisiae.R64-1-1.75.gtf export ANSWER_DEMULTIPLEXING=NO export ANSWER_REMOVE_PCR_DUPLICATES=YES export ANSWER_RNASEQ_COUNTING=YES export SAMPLE_ARRAY=(RT1.fastq RT2.fastq RD1.fastq RD2.fastq) export CONDITION_ARRAY=(RT RT RD RD) export SAMPLE_INDEX_ARRAY=(NA) export ADAPTER_SEQUENCE_THREE_PRIME=AGATCGGAAGAGCGGTTCAG export STRANDED=yes


####Model of target.txt file

label files group RT1 RT1_htseq.txt RT RT2 RT2_htseq.txt RT RD1 RD1_htseq.txt RD RD2 RD2_htseq.txt RD


##Workflow :
![Pre-Processing steps](PreProcessingSteps.png)
![Pre-Processing steps](AlignmentAndCountingSteps.png)
![Pre-Processing steps](DifferentialAnalysisSteps.png)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
PythonScripts		PythonScripts
RScripts		RScripts
AlignmentAndCountingSteps.png		AlignmentAndCountingSteps.png
DifferentialAnalysisSteps.png		DifferentialAnalysisSteps.png
Dockerfile		Dockerfile
PreProcessingSteps.png		PreProcessingSteps.png
README.md		README.md
riboproanalysisDocker.sh		riboproanalysisDocker.sh
riboproanalysisLocal.sh		riboproanalysisLocal.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RiboProAnalysis

About

Releases

Packages

Languages

GenomiqueENS/riboproanalysis

Folders and files

Latest commit

History

Repository files navigation

RiboProAnalysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages