metashot/mag-illumina is a workflow for the assembly and binning of Illumina sequences from metagenomic samples.
- Input: single-end, paired-end (also interleaved) Illumina sequences (gzip and bzip2 compressed FASTQ also supported);
- Histogram text files (for each input sample) of base frequency, quality scores, GC content, average quality and length are generated from input reads and clean reads using bbduk;
- Adapter trimming, contaminant filtering and quality filtering/trimming and length filtering using bbduk;
- Assembly with Spades or Megahit;
- Assembly statistics using bbtools;
- Binning with Metabat2.
- Optionally, assemble plasmids with metaplasmidSPAdes and verify them using ViralVerify.
- Install Docker (or Singulariry) and Nextflow (see Dependences);
- Start running the analysis:
nextflow run metashot/mag-illumina \
--reads '*_R{1,2}.fastq.gz' \
--outdir results
See the file nextflow.config
for the complete list of
parameters.
The files and directories listed below will be created in the results
directory
after the pipeline has finished.
scaffolds
: scaffolds for each input sample;bins
: genome bins;unbinned
: unbinned contigs;stats_scaffolds.tsv
: scaffold statistics;verified_plasmids
: verified plasmids (if--run_metaplasmidspades
is set).
raw_reads_stats
: base frequency, quality scores, gc content, average quality and length for each input sample;clean_reads_stats
: same as above, but for the reads after the quality control;clean_reads
: clean reads (if--save_clean
is set);qc
: adapter trimming and contaminant filtering statistics;metaspades
,metaplasmidspades
andmegahit
: complete assembler output for each sample (if--save_assembler_output
is set);scaffolds_plasmids
: candidate plasmids (if--run_metaplasmidspades
is set);viralverify
: viralVerify output (if--run_metaplasmidspades
is set);metabat2
: metabat2 log and the depth of coverage for each assembly.
Please refer to System requirements for the complete list of system requirements options.