MetaKraken

Snakemake pipeline for profiling composition of microbial communities from metagenomic shotgun sequencing data using Kraken2 and Bracken.

Overview

Input:

Quality processed paired-end fastq files from shotgun metagenome sequencing.

Output:

Table of microbial species and their relative abundance for each sample, output/merged_abundance_table.txt
Heatmap of abundance results, output/abundance_heatmap_species.png
Profile plot pdfs generated per sample or a user defined metadata varaible

Pipeline summary

Steps

Kraken2 is used to generate profiles of microbial clades and their abundances.
Bracken is used to re-estimate reads assigned to species.
Kreprot2mpa is used to generate profiles in mpa format which is similar to one used by MetaphlAn2
Normalized mpa files to generate percentages of each clade
Merge all sampel profiles into one table.
Generate a heatmap with N most abundant species.
Generate barplots with taxonomic information

Installation

To use this pipeline, navigate to your project directory and clone this repository into that directory using the following command:

git clone https://github.com/SycuroLab/metakraken.git metakraken

Note: you need to have conda and snakemake installed in order to run this. To install conda, see the instructions here.

To install snakemake using conda, run the following line:

conda install -c bioconda -c conda-forge snakemake

See the snakemake installation webpage for further details.

Config file

All the parameters required to run this pipeline are specified in a config file, written in yaml. See/modify the provided example file with your custom parameters, called config.yaml.

Important parameters

General

list_files: a file containing the sample names
path: path to trimmed fastq files
for: Suffix for forward reads
rev: suffix for reverse reads

Kraken + Bracken related

kraken_db: Location of the Kraken database to use
level: Level for bracken taxa (defualt 'S', Option:'D', 'P', 'C', 'O', 'F', 'G', 'S')
redreadlen: Read length required for Bracken
threshold: specifies the minimum number of reads required for a classification at the specified rank.

Barplots related (used by plot_profile.R)

variableX: X-axis variable to make barplots at different taxa level
variableFacet: Facet variable to make barplots at different taxa level
topN: Top N most abundant families/genra/species to be plotted by the R script
metadata: comma delimited (csv) metadata file with Sample Names same as used in list_files file

Running Instructions

Test the pipeline by running snakemake -np. This command prints out the commands to be run without actually running them.

To run the pipeline on the Synergy compute cluster, enter the following command from the project directory:

snakemake --cluster-config cluster.json --cluster 'bsub -n {cluster.n} -R {cluster.resources} -W {cluster.walllim} -We {cluster.time} -M {cluster.maxmem} -oo {cluster.output} -e {cluster.error}' --jobs 500 --use-conda

The above command submits jobs to Synergy, one for each sample and step of the QC pipeline. Note: the file cluster.json contains the parameters for the LSF job submission system that Synergy uses. In most cases, this file should not be modified.

Results and log files

Snakemake will create a directory for the results of the pipeline (default: output) as well as a directory called logs for all the log files.

Addition information

Information about Kraken2 can be found on https://ccb.jhu.edu/software/kraken2/index.shtml?t=manual Information about Bracken can be found on https://ccb.jhu.edu/software/bracken/

How to build Databases:

Kraken:

At the moment standard Kraken2 db is downloaded and prepared to be used with this snakemake file. It is saved as kraken2db_2020_02_26 in /gpfs/snyder_work/shared/lsycuro_labshare/dbs/kraken2db_2020_02_26

The command used to build it is

kraken2-build --standard --threads THREADS --db DBNAME

You can also build custom databases. Please read the kraken2 manual for more information.

Bracken:

Bracken files are built using Kraken2 db. It requres Kmer length (defualt=35) and read length of the paried-end reads. You might have to re-run bracke-build if your read length has not already been build

bracken-build -d KRAKEN_DB -t THREADS -k KMER_LEN -l READ_LEN

Note: Kraken2 is working with nullarbor conda file but not with kraken2 conda file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MetaKraken

Overview

Pipeline summary

Steps

Installation

Config file

Important parameters

General

Kraken + Bracken related

Barplots related (used by plot_profile.R)

Running Instructions

Results and log files

Addition information

How to build Databases:

Kraken:

Bracken:

Files

README.md

Latest commit

History

README.md

File metadata and controls

MetaKraken

Overview

Pipeline summary

Steps

Installation

Config file

Important parameters

General

Kraken + Bracken related

Barplots related (used by plot_profile.R)

Running Instructions

Results and log files

Addition information

How to build Databases:

Kraken:

Bracken: