New: the metaWRAP publication preprint is available at bioRxiv
MetaWRAP aims to be an easy-to-use wrapper suite that accomplishes the core tasks of metagenomic analysis: read QC, assembly, visualization, taxonomic profiling, extracting draft genomes (binning), and functional annotation. However, unlike similar pipelines before it, metaWRAP takes bin extraction and analysis to the next level (see module overview below). While there is no single best approach for processing metagenomic data, metaWRAP is meant to be a fast and simple first pass program before you delve deeper into parameterization of your approach. Each individual module of metaWRAP is also a standalone program. For example, if you are interested only in the Read_qc because you want to remove human reads from your data, or Quant_bins because you have bins you want to accurately quantify accross samples, you are welcome to only use those modules.
In addition to being a tool wrapper, MetaWRAP offers a powerful hybrid approach for extracting high-quality draft genomes (bins) from metagenomic data by using a variety of software (metaBAT2, CONCOCT, and MaxBin2, for example, since they are already wrapped into the Binning module) and utilizing their individual strengths and minimizing their weaknesses. MetaWRAP's bin refinement module outperforms not only individual binning approaches, but also other bin consolidation programs (Binning_refiner, DAS_Tool) in both synthetic and real datasets. I emphasize that because this module is a standalone component, I encourage you to use your favorite binning softwares for the 3 intitial predictions (they do not have to come from metaBAT2, CONCOCT and MaxBin2). These predictions can also come from different parameters of the same software.
MetaWRAP also includes a novel bin reassembly module, which allows to drastically improve the quality of a set of bins by extracting the reads belonging to each bin, and reassembling the bins with a more permissive, non-metagenomic assembler. In addition to improving the N50 of the bins, this modestly increases the completion of the bins, and drastically reduces contamination. I recommend you run the reassembly on the final bins set from the Bin_refinement module, but this can be any bin set.
1) Read_QC: read trimming and human read removal
2) Assembly: metagenomic assembly and QC with metaSPAdes or MegaHit
3) Kraken: taxonomy profiling and visualization or reads or contigs
1) Binning: initial bin extraction with MaxBin2, metaBAT2, and/or CONCOCT
2) Bin_refinement: consolidate of multiple binning predicitons into a superior bin set
3) Reassemble_bins: reassemble bins to improve completion and N50, and reduce contamination
4) Quant_bins: estimate bin abundance across samples
5) Blobology: visualize the community and extracted bins with blobplots
6) Classify_bins: conservative but accurate taxonomy prediction for bins
7) Annotate_bins: functionally annotate genes in a set of bins
For more details, please consult the metaWRAP module descriptions and the publication preprint.
The resource requirements for this pipeline will vary greatly based on the amount of data being processed, but due to large memory requirements of many software used (KRAKEN and metaSPAdes to name a few), I would advise against attempting to run it on anything less than 10 cores and 100GB RAM. MetaWRAP officially supports only Linux x64 systems, but may be manually installed on others.
To start, download miniconda2 and install it. Then add channels to your conda environment, and install metaWRAP (supports Linux64):
# ORDER IS IMPORTANT!!!
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels ursky
conda install -c ursky metawrap-mg
The conda installation of metaWRAP will install over 140 software dependancies, which may cause some conflicts with your currenly installed packages. If you already use conda, it is strongly recommended to set up a conda custom environment and install metaWRAP only in there. That way your current conda environment and metaWRAP's environment do not not conflict.
conda create -n metawrap-env python=2.7
source activate metawrap-env
# ORDER IS IMPORTANT!!!
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels ursky
conda install -c ursky metawrap-mg
You may want to manually install metaWRAP if you want better control over your environment, if you are installing on a system other than Linux64, or you just really dislike conda. In any case, you will need to manually install the relevant prerequisite programs. When you are ready, download or clone this ripository, carefully configure the metaWRAP/bin/config-metawrap
file, and add the metaWRAP/bin/
directory to to the $PATH
. Thats it!
In addition to the Conda installation, you will need to configure the paths to some databases that you downloaded onto your system. Use your favorite text editor to configure these paths in /some/path/miniconda2/bin/config-metawrap and make sure everything looks correct. If you are unsure where this config file is, run:
which config-metawrap
This is very important if you want to use any functions requiring databases, but depending on what you plan to do, the databases are not mandatory for metaWRAP (see Database section below). Follow this guide for download and configuration instructions.
Database | Size | Used in module |
---|---|---|
Checkm_DB | 1.4GB | binning, bin_refinement, reassemble_bins |
KRAKEN standard database | 161GB | kraken |
NCBI_nt | 71GB | blobology, classify_bins |
NCBI_tax | 283MB | blobology, classify_bins |
Indexed hg38 | 20GB | read_qc |
Please look at the MetaWRAP usage tutorial for detailed run instructions and examples.
Once all the dependencies are in place, running metaWRAP is relatively simple. The main metaWRAP script wraps around all of its individual modules, which you can call independently.
metaWRAP -h
Usage: metawrap [module] --help
Options:
read_qc Raw read QC module
assembly Assembly module
binning Binning module
bin_refinement Refinement of bins from binning module
reassemble_bins Reassemble bins using metagenomic reads
quant_bins Quantify the abundance of each bin across samples
blobology Blobology module
kraken KRAKEN module
Each module is run separately. For example, to run the assembly module:
metawrap assembly -h
Usage: metawrap assembly [options] -1 reads_1.fastq -2 reads_2.fastq -o output_dir
Options:
-1 STR forward fastq reads
-2 STR reverse fastq reads
-o STR output directory
-m INT memory in GB (default=10)
-t INT number of threads (defualt=1)
--use-megahit assemble with megahit (default)
--use-metaspades assemble with metaspades instead of megahit
While the metaWRAP manuscript is in review for publication, please cite the bioRxiv preprint: MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
Author of pipeline: Gherman Uritskiy.
Principal Investigators: James Taylor and Jocelyne DiRuggiero
Institution: Johns Hopkins, Department of Cell, Molecular, Developmental Biology, and Biophysics
All feedback is welcome! For errors and bugs, please open a new Issue thread on this github page, and I will try to get things patched as quickly as possible. Please include the version of metaWRAP you are using (run metawrap -v
), For general questions, suggestions and other feedback, you can contact me at [email protected].