Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies
Some example results are available at the homepage:
FMAP provides a more sensible reference protein sequence database based on UniRef.
Identification of differentially-abundant genes KEGG Orthology
Mapping differentially-abundant genes to pathways and modules (KEGG Pathway and KEGG Module)
Mapping differentially-abundant genes to operons (ODB (v3))
Perl - scripting language
R - statistical computing
Statistics::R - Perl interface with the R statistical program
- Use CPAN to install the module
perl -MCPAN -e 'install Statistics::R'
- or download the source and compile manually
wget '' tar zxf Statistics-R-0.33.tar.gz cd Statistics-R-0.33 perl Makefile.PL make make test make install
Mapping program providing BLASTX search of sequencing reads: DIAMOND or USEARCH
Linux commands:
Bio::DB::Taxonomy - Access to a taxonomy database (which is required only if you want to build a custom database.)
XML::LibXML - Perl Binding for libxml2 (which is required only if you want to download genome sequences.)
- Process
- Input
- UniRef sequence identity (50, 90, or 100)
- (optional) NCBI taxonomy IDs (integer)
- Require Bio::DB::Taxonomy.
- The following data files will be downloaded through FTP connection. If you have a problem in the FTP connection, please download the files through another method and copy them into "FMAP_data" directory before executing "" command.
or - Require HTTP connection for KEGG API.
Usage: perl [options] 50|90|100 [NCBI_TaxID [...]]
Options: -h display this help message
-s switch database
-r redownload data
Usage: perl [options]
Options: -h display this help message
-r redownload data
-m FILE executable file path of mapping program, "diamond" or "usearch" [diamond]
-k download prebuilt KEGG files
- Process
- Input
- Prefix of output files
- De novo assembled sequences in FASTA format
- A FASTA file can be generated by metagenome assemblers such as SPAdes and MetaVelvet.
- A FASTA file containing target genome sequences can be input instead.
- Whole metagenomic/metatranscriptomic shotgun sequencing reads in FASTQ or FASTA format
- Multiple read files can be specified.
- Paired-end read files must be specified comma-separated like "input.R1.fastq,input.R2.fastq".
- The read files can be compressed by gzip.
- Output
- Prefix.region.abundance.txt (abundances of ORF regions mapping to KEGG orthologies)
- Prefix.abundance.txt (abundances of KEGG orthologies)
Usage: perl [options] output.prefix assembly.fasta [input.fastq|input.R1.fastq,input.R2.fastq [...]] > summary.txt
Options: -h display this help message
-A STR prepared assembly prefix
-B input indexed sorted BAM file instead of FASTQ file
-m FILE executable file path of mapping program, "diamond" or "usearch" [diamond]
-p INT number of threads [1]
-e FLOAT maximum e-value to report alignments [10]
-t DIR directory for temporary files [$TMPDIR or /tmp]
-a FLOAT search acceleration for ublast [0.5]
-C STR codon and translation e.g. ATG=M [NCBI genetic code 11 (Bacterial, Archaeal and Plant Plastid)]
-S STR comma-separated start codons [GTG,ATG,CTG,TTG,ATA,ATC,ATT]
-T STR comma-separated termination codons [TAG,TAA,TGA]
-l INT minimum translation length [10]
-c FLOAT minimum coverage [0.8]
-q INT minimum mapping quality [0]
-s STR strand specificity, "f" or "r"
-P STR contig prefix used for abundance estimation
- Require Centrifuge.
- Input
- FMAP_assembly.region.txt (ORF regions mapping to KEGG orthologies generated by FMAP_assembly)
- De novo assembled sequences in FASTA format
- Centrifuge index filename prefix (minus trailing
- Output: FMAP_assembly.region.taxon.txt (FMAP_assembly.region.txt including a column of NCBI taxonomy IDs (integer))
Usage: perl [options] FMAP_assembly.region.txt assembly.fasta centrifuge.index
Options: -h display this help message
-p INT number of threads [1]
- Require Bio::DB::Taxonomy.
- Input: FMAP_assembly.abundance.txt (abundances generated by FMAP_assembly)
- Output: HTML format of abundance heatmap table
Usage: perl [options] [name=]FMAP_assembly.abundance.txt [...] > FMAP_assembly_heatmap.html
Options: -h display this help message
-c FILE comparison output file including orthology and filter columns
-f INT HTML font size
-w INT HTML table cell width
- Input: FMAP_assembly.region.txt (ORF regions mapping to KEGG orthologies generated by FMAP_assembly)
- Output: FMAP_assembly_operon.txt (ODB (v3) known operons consisting of orthologies located together on an assembled contig/scaffold/transcript)
Usage: perl [options] FMAP_assembly.region.txt > FMAP_assembly_operon.txt
Options: -h display this help message
-a print single-gene operons as well
- Input: NCBI taxonomy IDs (integer)
- Output: FASTA file containing genome sequences
- Require XML::LibXML.
Usage: perl [options] NCBI_TaxID [...] > genome.fasta
Options: -h display this help message
-a assembly instead of genome
Usage: perl [options]
Options: -h display this help message
-m FILE executable file path of mapping program, "diamond" or "usearch" [diamond]
-k download prebuilt KEGG files
-x download only KEGG files
- Input: whole metagenomic (or metatranscriptomic) shotgun sequencing reads in FASTQ or FASTA format
- Output: best-match hits in NCBI BLAST ‑m8 (= NCBI BLAST+ ‑outfmt 6) format
Usage: perl [options] input1.fastq|input1.fasta [input2.fastq|input2.fasta [...]] > blastx_hits.txt
Options: -h display this help message
-m FILE executable file path of mapping program, "diamond" or "usearch" [diamond]
-p INT number of threads [1]
-e FLOAT maximum e-value to report alignments [10]
-t DIR directory for temporary files [$TMPDIR or /tmp]
-a FLOAT search acceleration for ublast [0.5]
- Input: output of ""
- Output: abundances (RPKM) of KEGG orthologies
- Output columns: KEGG Orthology ID, orthology definition, abundance (RPKM)
Usage: perl [options] blast_hits1.txt [blast_hits2.txt [...]] > abundance.txt
Options: -h display this help message
-c use CPM values instead of RPKM values
-i FLOAT minimum percent identity [80]
-l FILE tab-delimited text file with the first column having protein names and the second column having the sequence lengths
-o FILE tab-delimited text file with the first column having protein names and the second column having the orthology names
-d FILE tab-delimited text file with the first column having orthology names and the second column having the definitions
-w FILE tab-delimited text file with the first column having read names and the second column having the weights
- Input: outputs of ""
- Output: abundance table
- Output columns: KEGG Orthology ID, orthology definition, abundance of sample1, abundance of sample2, ...
Usage: perl [options] [name1=]abundance1.txt [[name2=]abundance2.txt [...]] > abundance_table.txt
Options: -h display this help message
-c use raw read counts (readCount|count) instead of RPKM values
-d use normalized mean depths (meanDepth/genome) instead of RPKM values
-f use fractions
-n do not print definitions
-r print ORF regions
- Input: output of "", sample group information
- Output: comparison test statistics for orthologies
- Output columns: KEGG Orthology ID, orthology definition, log2 fold change, p-value, FDR-adjusted p-value, filter (pass or fail)
Usage: perl [options] abundance_table.txt control1[,control2[...]] case1[,case2[...]] [...] > orthology_test_stat.txt
Options: -h display this help message
-t STR statistical test for comparing sample groups, "kruskal", "anova", "poisson", "quasipoisson", "metagenomeSeq" [kruskal]
-f FLOAT fold change cutoff [2]
-p FLOAT p-value cutoff [0.05]
-a FLOAT FDR-adjusted p-value cutoff [1]
- Input: output of ""
- Output: pathways enriched in filter-passed orthologies
- Output columns: KEGG Pathway ID, pathway definition, orthology count, coverage, p-value, KEGG Orthology IDs with colors
- KEGG Orthology IDs with colors: input of KEGG Pathway mapping (
Usage: perl [options] orthology_test_stat.txt > pathway.txt
Options: -h display this help message
- Input: output of ""
- Output: modules enriched in filter-passed orthologies
- Output columns: KEGG Module ID, module definition, orthology count, coverage, p-value, KEGG Orthology IDs with colors
- KEGG Orthology IDs with colors: input of KEGG Pathway mapping (
Usage: perl [options] orthology_test_stat.txt > module.txt
Options: -h display this help message
- Input: output of ""
- Output: operons consisting of filter-passed orthologies
- Output columns: ODB (v3) known operon IDs, operon definition, log2 fold change, KEGG Orthology IDs, KEGG Pathway IDs
Usage: perl [options] orthology_test_stat.txt > operon.txt
Options: -h display this help message
-a print single-gene operons as well
- Input: output of "", "", or ""
- Output: PNG format image file of p-value plot
Usage: perl [options] pathway.txt|module.txt|operon.txt plot.pdf
Options: -h display this help message
-w INT plot width [12]
-h INT plot height [8]
-l FLOAT plot left margin [20]
-p FLOAT p-value cutoff [0.05]
-c FLOAT coverage cutoff [0 for pathway, 1 for module and operons]
-d do not print definition
- Input: configuration table file
- Input columns: group (control, ...), sample name, input file of ""
- Output: script file including all FMAP commands, all FMAP outputs
Usage: perl [options] input.config [output_prefix]
Options: -h display this help message
-s generate a script, but not execute it
-m FILE mapping: executable file path of mapping program, "diamond" or "usearch" [diamond]
-t INT mapping: number of threads [1]
-c STR comparison: statistical test for comparing sample groups, "kruskal", "anova", "poisson", "quasipoisson", "metagenomeSeq" [kruskal]
-f FLOAT comparison: fold change cutoff [2]
-p FLOAT comparison: p-value cutoff [0.05]
-a FLOAT comparison: FDR-adjusted p-value cutoff [1]
Use the prebuilt database (UniRef90 and bacteria/archaea/fungi)
Use a custom database (you can define UniRef and taxonomy.)
Kim J, Kim MS, Koh AY, Xie Y, Zhan X. "FMAP: Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies" BMC Bioinformatics. 2016 Oct 10;17(1):420. PMID: 27724866