This document describes the output produced by the pipeline.
The pipeline is built using Nextflow and processes data using the following steps:
- Fastp - read quality control
- Decontamination - host DNA removal
- Kraken2 - Reads classification with Kraken2
- Bracken - Taxonomic profiling using Bracken
- MetaPhlAn2 - Taxonomic profiling using MetaPhlAn2
- StrainPhlAn - Strain profiling
- HUMAnN2 - Pathway profiling using HUMAnN2
- SRST2 - Resistome profiling with SRST2
Fastp performs adapter removal and low quality base trimming. This step is streamed to the next decontamination step.
Output directory: pipeline_output/decont/
sample.html
andsample.json
- Fastp report
This step takes the Fastp output, maps the reads to the host reference genome provided and outputs only the unmapped reads.
Output directory: pipeline_output/decont/
sample_fastpdecont_1.fastq.gz
andsample_fastpdecont_2.fastq.gz
- Unmapped reads by the pipeline
Kraken2 assign taxonomy to reads (read pairs) based on K-mer profile.
Output directory: pipeline_output/kraken2_out
sample.kraken2.report
- Plain text file for standard Kraken2 report
sample.kraken2.tax
- Plain text file for MetaPhlAn-like taxonomic profile (in read counts)
Split table into taxonomic levels: pipeline_output/split_kraken2_out
sample.[dpcofgs].tsv
- Plain text files for taxonomic profile at domain, phylum, class, order, family, genus, species, respectively
Bracken estimates relative abundances of taxa based on a Kraken2 report.
Output directory: pipeline_output/bracken_out
sample.bracken.g.tsv
- Tab-delimited text file for genus level relative abundances
sample.bracken.s.tsv
- Tab-delimited text file for species level relative abundances
StrainPhlAn is a strain analysis workflow based on SNP.
Output directory: pipeline_output/strainphlan_out
sample.metaphlan2.markers
- Marker files for each sample
MetaPhlAn2 estimates relative abundances of taxa by mapping reads to clade-specific marker genes.
Output directory: pipeline_output/metaphlan2_out
sample.metaphlan2.tax
- Tab-delimited text file for the full taxonomic profile
Split table into taxonomic levels: pipeline_output/split_metaphlan2_out
sample.[dpcofgs].tsv
- Plain text files for taxonomic profile at domain, phylum, class, order, family, genus, species, respectively
HUMAnN2 estimates gene family and pathway abundances.
Output directory: pipeline_output/humann2_out
sample.humann2_genefamilies.tsv
andsample.humann2_genefamilies.relab.tsv
- Tab-delimited text file for the raw and normalized gene family abundances
sample.humann2_pathabundance.tsv
andsample.humann2_pathabundance.relab.tsv
- Tab-delimited text file for the raw and normalized pathway abundances
sample.humann2_pathcoverage.tsv
- Tab-delimited text file for the pathway coverage
SRST2 reports the presence of antibiotics resistance genes.
Output directory: pipeline_output/srst2_out
sample__fullgenes__ARGannot.r3__results.txt
- Tab-delimited text file for the full report
sample__genes__ARGannot.r3__results.txt
- Tab-delimited text file for the simplified report
sample__ARGannot.r3__results.sorted.bam
- BAM file for alignments
See here for details