Snakemake pipeline for RNA-Seq analysis

Overview

This pipeline runs the following steps:

Download mouse reference genome, transcriptome and gene annotation files from GENCODE.
Run FastQC and MultiQC on the FASTQ files listed in the design matrix.
Use Salmon to quantify transcript abundance for each sample.
Use STAR to map reads to the genome into a BAM file, and count number of reads per gene while mapping.
Merge Salmon TPMs, Salmon read counts, and STAR read counts into three tables.
Use DESeq2 to compare transcript abundance between samples.

Getting started

First install Snakemake following the official installation instructions. Next, modify the configuration file to set the reference file versions and FASTQ file locations.

Now we can simply run the pipeline with:

snakemake -c <num-of-cores> --use-conda

It's recommended to run the pipeline in tmux or screen as it can take a long time. Output to stdout are also saved to .snakemake/log/. Output of R scripts are saved to logs/.

This pipeline will genrate the following subdirectories in the results/ directory:

annotation/: GENCODE reference files, Salmon index files, and STAR index files.
qc/: directory for FastQC and MultiQC output.
salmon/: directory for Salmon output.
star/: directory for STAR output.
deseq2/: directory for differential expression analysis results.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
envs		envs
rules		rules
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake pipeline for RNA-Seq analysis

Overview

Getting started

About

Releases

Packages

Languages

UGA-CSBL/rna-seq-snakemake

Folders and files

Latest commit

History

Repository files navigation

Snakemake pipeline for RNA-Seq analysis

Overview

Getting started

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages