This app bundles Bowtie2, Tophat2 and Cufflinks to map RNA-Seq reads and quantify expression. This results in a Mappings table containing all mapped reads and a table containing per-gene expression level represented in FPKM values (Fragments Per Kilobase of transcript per Million mapped reads). Full output of the cufflinks program is also output as a tar file which also includes expression on the per isoform level.
This pipeline does not support novel gene or isoform discovery. Reads will only be mapped to transcripts found in the input Genes object. This corresponds to running Tophat with the "-G" and "--transcriptome-index" options. While these options will always be passed to Tophat, further options modifying both Tophat and Cufflinks steps are accepted by the app.
Links to the source and information about the underlying programs can be found here:
The input is an array of gtables of type Reads, which contain the RNA-seq reads. These can be generated from FASTQ or FASTA files by running the Reads Importer app. The pipeline support both paired and unpaired reads but all inputs must be of the same type (either all paired or all unpaired).
The reference genome for mapping is provided in the form of a ContigSet object. These can be created by importing a FASTA file using the "Genome Importer" app. The genome must be compatible with the gene models supplied. Alternatively, a Bowtiev2 indexed copy of the reference genome can be provided. If not supplied, an index will be generated, a process that may take up to several hours. The indexed genome will then be included as part of the app output, and can be provided in later runs of the app.
Gene models are provided as a Genes object describing the transcripts to map reads to. A GTF, GFF, or BED file can be imported to a Genes object using an importer app.
This parameter is a string containing all additional options to be passed to Tophat during execution. Tophat uses the Bowtie program to align RNA-Seq reads to the given transcriptome. A guide to Tophat options can be found here. As mentioned above the "-G" and "--transcriptome-index" options will always be passed to Tophat. The string input here will be passed directly to the Tophat program and therefore must be formatted as it would be on the command line. Contradictory or invalid parameters will not be caught and will cause the app to fail.
This optional parameter is a string containing additional options to be passed to Cufflinks during execution. Cufflinks takes the mappings generated by Tophat and calculates the level of expression of each gene and transcript. A guide to Cufflinks options can be found here The pipeline uses the "-G", "-p", and "-o" options to input the gene model, use all processors, and capture the output. Do not include these as further parameters. No additional options are set by default.