metashot/mag-illumina is a workflow for the assembly and binning of Illumina sequences from metagenomic samples.
- Input: single-end, paired-end (also interleaved) Illumina sequences (gzip and bzip2 compressed FASTQ also supported);
- Histogram text files (for each input sample) of base frequency, quality scores, GC content, average quality and length are generated from input reads and clean reads using bbduk;
- Adapter trimming, contaminant filtering and quality filtering/trimming and length filtering using bbduk;
- Assembly with Spades or Megahit;
- Assembly statistics using bbtools;
- Binning with Metabat2.
- Optionally, assemble plasmids with metaplasmidSPAdes and verify them using ViralVerify.
- Install Docker (or Singulariry) and Nextflow (see Dependences);
- Start running the analysis:
nextflow run metashot/mag-illumina \
--reads '*_R{1,2}.fastq.gz' \
--outdir results
See the file nextflow.config
for the complete list of
The files and directories listed below will be created in the results
after the pipeline has finished.
: scaffolds for each input sample;bins
: genome bins;unbinned
: unbinned contigs;stats_scaffolds.tsv
: scaffold statistics;verified_plasmids
: verified plasmids (if--run_metaplasmidspades
is set).
: base frequency, quality scores, gc content, average quality and length for each input sample;clean_reads_stats
: same as above, but for the reads after the quality control;clean_reads
: clean reads (if--save_clean
is set);qc
: adapter trimming and contaminant filtering statistics;metaspades
: complete assembler output for each sample (if--save_assembler_output
is set);scaffolds_plasmids
: candidate plasmids (if--run_metaplasmidspades
is set);viralverify
: viralVerify output (if--run_metaplasmidspades
is set);metabat2
: metabat2 log and the depth of coverage for each assembly.
Please refer to System requirements for the complete list of system requirements options.