Skip to content

GenomiqueENS/riboproanalysis

Repository files navigation

RiboProAnalysis

RiboProAnalysis is a pipeline for Ribosome Profiling analysis for all eukaryotic genome from Ensembl 75+. It performs pre-processing steps (quality control, filtering, trimming and size selection), reads mapping to rRNA and reference genome, counting on CDS for each gene and differential analysis from raw Ribosome Profiling data.

##Use : RiboProAnalysis can be used via a Docker image (URL) and a standard Bash script with several cases : it can performs demultiplexing on multiplexed FASTQ (reads MUST begin with the index sequence) and use of RNA-seq counts to give a study of the mode of regulation of the translation. If you use FASTQ files (no demultiplexing), the extension have to be .fastq

A configuration file .conf is mandatory to laucun the program. If there is no use of RNA-seq counts, a tabulated design file, target.txt, is needed. User have to build rRNA and genome index before start with the pipeline. If you have RNA-seq counts, files must be named : SAMPLENAME_mRNAcounts.txt

  • Build Bowtie1 index for rRNA sequences
bowtie-build rRNA.fasta rRNA
  • Build STAR index for reference genome
STAR --runMode genomeGenerate --genomeDir /path/to/genome/index --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 ... --sjdbGTFfile /path/to/gtf/annotations \
--sjdbOverhang 28

Create a tmp/ directory in your working directory with command mkdir tmp/

Run RiboProAnalysis container with following command in the working directory :

-v /path/to/rRNA/index:**/rRNAindexdirectory** -v /path/to/genome/index:**/genomeindexdirectory** -v /path/to/directory/containig/GTF/Ensembl/annotations:**/root -v $(pwd)/tmp:/tmp **\
**genomicpariscentre/riboproanalysis bash -c "riboproanalysisDocker.sh** My_configuration_file.conf"
```

Run RiboProAnalysis bash program with following command in the working directory :

riboproanalysis.sh MyConfigurationFile.conf


### Available variables to set in the configuration file

| Variables                          | Explanation                                                                                                                           | Choices/Examples                                                     | Default                                   |
|------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|-------------------------------------------|
| PATH_TO_GENOME_INDEX               | Absolute path to genome index previously built with STAR                                                                              | /absolute/path/to/genome/index/                                      | Mandatory (if not Docker mode)            |
| PATH_TO_rRNA_INDEX                 | Absolute path to rRNA index previously built with Bowtie1                                                                             | /absolute/path/to/rRNA/index                                         | Mandatory (if not Docker mode)            |
| PATH_TO_ANNOTATION_FILE            | Absolute path to GTF annotations file (Ensembl 75)                                                                                    | /absolute/path/to/gtf/annotations                                    | Mandatory                                 |
| USER_IDS                           | Result of the bash command : $(id -u):$(id -g)                                                                                        | UserId:GroupId                                                       | Mandatory (if Docker mode)                |
| SAMPLE_ARRAY                       | Array containing sample names (if demultiplexing) or FASTQ file for each sample                                                       | (Sample1 Sample 2 Sample 3) OR (Samp1.fastq Samp2.fastq Samp3.fastq) | Mandatory                                 |
| ADAPTER_SEQUENCE_THREE_PRIME       | Adapter sequence for 3' trimming                                                                                                      | AAAAAAAGGTCCTAA                                                      | Mandatory                                 |
| STRANDED                           | Answer for stranded option of HTSeq-Count                                                                                             | yes/no/reverse                                                       | Mandatory                                 |
| PATH_TO_RAW_UNDEMULTIPLEXED_FILE   | Absolute path to multiplexed FASTQ file                                                                                               | /absolute/path/to/multiplexed/fastq                                  | Mandatory for demultiplexing              |
| SAMPLE_INDEX_ARRAY                 | Array containing 5' index used for demultiplexing. Respect same order as in SAMPLE_ARRAY so index match with respective sample name   | (IndexSamp1 IndexSamp2 IndexSamp3)                                   | Mandatory for demultiplexing. Never empty |
| ANSWER_REMOVE_POLYN_READS          | Option to remove reads containing more than 2 N bases (cutadapt –max-n 2)                                                             | YES / NO                                                             | NO                                        |
| ANSWER_DEMULTIPLEXING              | Option to launch demultiplexing step                                                                                                  | YES / NO                                                             | NO                                        |
| ANSWER_REMOVE_PCR_DUPLICATES       | Option to launch PCR duplicates removing                                                                                              | YES / NO                                                             | NO                                        |
| ANSWER_RNASEQ_COUNTING             | Option to launch Babel OR SARTools for differential analysis                                                                          | YES / NO                                                             | NO                                        |
| ANSWER_KEEP_MULTIREAD              | Option to keep multi-reads in a distinct SAM file                                                                                     | YES / NO                                                             | NO                                        |
| DIFFERENTIAL_ANALYSIS_PACKAGE      | Choice of the R package launched by SARTools                                                                                          | DESEQ2 / EDGER                                                       | EDGER                                     |
| CONDITION_ARRAY                    | Array containig condition name of each sample respecting the same order                                                               | (Cond_Samp1 Cond_Samp2 Cond_Samp3)                                   | Mandatory with Babel                      |
| AUTHOR                             | Author's name                                                                                                                         | UserName                                                             | Mandatory for SARTools                    |
| REFERENCE_CONDITION                | Reference condition for the statistical analysis of SARTools                                                                          | WT                                                                   | Mandatory for SARTools                    |
| CHECK_DOCKER_IMAGES				 | Check the tags of Docker images																										 | YES / NO																| NO

##Installation :
This software could be launched from a Docker container launcheing Docker containers itself, or from a Bash script launchine Docker containers.

You should :
* Install Docker on your computer
* Pull following docker images from the Genomic Paris Centre Docker public repository : 
	* genomicpariscentre/fastqc:0.11.5
	* genomicpariscentre/cutadapt:1.8.3
	* genomicpariscentre/bowtie1:1.1.1
	* genomicpariscentre/star:2.5.1b
	* genomicpariscentre/gff3-ptools:0.4.0
	* genomicpariscentre/samtools:0.1.19
	* genomicpariscentre/htseq:0.6.1p1
	* genomicpariscentre/babel:0.3-0
	* genomicpariscentre/sartools:1.3.2

* Pull RiboProAnalysis image

##Input files :

###Configuration file :
You have to create your configuration file .conf in the working directory. It is a little Bash script which is imported in the main Bash script.
You put mandatory and interesting variables presented in **Available variables to set in the configuration file**.
The syntax to declare a variable is :

export VARIABLE_NAME=MyVariable


###Target file (if SARTools is used) :
The target file should include following columns : label, files, group
* **label** the sample label
* **files** the Ribosome Profiling counts files. Syntaxe if demultiplexing : SAMPLENAME_htseq.txt (ex : sample MB1 --> MB1_htseq.txt). Syntaxe if no demultiplexing : BASENAME_FASTQFILE_htseq.txt (ex : file MB1.toto.fastq --> MB1.toto_htseq.txt)
IF you use Babel without demultiplexing, you have to name your FASTQ file : SAMPLENAME.fastq --> your file after HTSeq-Count will be SAMPLENAME_RPcounts.txt and your count files for RNA-seq will must be SAMPLENAME_mRNAcounts.txt.
* **group** the group/condition given to a sample

####Model of configuration file for run with Bash script

export PATH_TO_RAW_UNDEMULTIPLEXED_FILE=/import/disir01/bioinfo/RiboPro/Riboprotma_project/2015_240_NoIndex_L008_R1_001.fastq export PATH_TO_GENOME_INDEX=/import/disir01/bioinfo/RiboPro/IndexAlignement/STAR/yeastGenomeEnsembl export PATH_TO_rRNA_INDEX=/import/disir01/bioinfo/RiboPro/IndexAlignement/Bowtie1/rRNALevureNCBI export PATH_TO_ANNOTATION_FILE=/import/rhodos01/shares-net/bioinfo/RiboPro/FichiersLevure/GenomeAnnotations_Ensembl/Saccharomyces_cerevisiae.R64-1-1.75.gtf export ANSWER_DEMULTIPLEXING=YES export ANSWER_REMOVE_PCR_DUPLICATES=YES export ANSWER_RNASEQ_COUNTING=NO export DIFFERENTIAL_ANALYSIS_PACKAGE=EDGER export SAMPLE_ARRAY=(RT1 RT11 RT7 RT13) export SAMPLE_INDEX_ARRAY=(NNNGGTTNN NNNAACCNN NNNTTAGNN NNNCGGANN) export ADAPTER_SEQUENCE_THREE_PRIME=AGATCGGAAGAGCGGTTCAG export STRANDED=yes export AUTHOR=User export REFERENCE_CONDITION=WildType


####Model of configuration file for run with Docker container

export USER_IDS=2747:100 export PATH_TO_RAW_UNDEMULTIPLEXED_FILE=$(pwd)/2015_240_NoIndex_L008_R1_001.fastq export PATH_TO_ANNOTATION_FILE=/import/rhodos01/shares-net/bioinfo/RiboPro/FichiersLevure/GenomeAnnotations_Ensembl/Saccharomyces_cerevisiae.R64-1-1.75.gtf export ANSWER_DEMULTIPLEXING=NO export ANSWER_REMOVE_PCR_DUPLICATES=YES export ANSWER_RNASEQ_COUNTING=YES export SAMPLE_ARRAY=(RT1.fastq RT2.fastq RD1.fastq RD2.fastq) export CONDITION_ARRAY=(RT RT RD RD) export SAMPLE_INDEX_ARRAY=(NA) export ADAPTER_SEQUENCE_THREE_PRIME=AGATCGGAAGAGCGGTTCAG export STRANDED=yes


####Model of target.txt file

label files group RT1 RT1_htseq.txt RT RT2 RT2_htseq.txt RT RD1 RD1_htseq.txt RD RD2 RD2_htseq.txt RD


##Workflow :
![Pre-Processing steps](PreProcessingSteps.png)
![Pre-Processing steps](AlignmentAndCountingSteps.png)
![Pre-Processing steps](DifferentialAnalysisSteps.png)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published