metashot/prok-pan is a workflow for pan genome analysis of closely related prokariotic genomes, mitochondria, and viruses.
- Input: prokaryotic genomes in FASTA format or annotated genomes in GFF3 format;
- Rapid prokaryotic genome annotation using prokka;
- Pan genome analysis and visualization using Roary;
- Phylogenetic tree inference (core genome) using RAxML, optional.
- Install Docker (or Singulariry) and Nextflow (see Dependences);
- Start running the analysis:
nextflow run metashot/prok-pan \
--genomes "data/*.fa" \
--outdir results
See the file nextflow.config
for the complete list of
parameters.
The files and directories listed below will be created in the results
directory after the pipeline has finished.
roary
: Roary output files. This folder includessummary_statistics.txt
(number of genes in the core and accessory),gene_presence_absence.csv
and the pangenome plots (pangenome_*.png
).
prokka
: the prokka output for each input sample (if--annotate = true
);raxml
: RAxML output (when--skip_core_tree = false
).
If --skip_core_tree = false
the phylogenetic tree is inferred from the core genome alignment using RaxML.
-
default mode: construct a maximum likelihood (ML) tree. This mode runs the default RAxML tree search algorithm1 and perform multiple searches for the best tree (10 distinct randomized MP trees by default, see the parameter
--raxml_nsearch
). The following RAxML parameters will be used:-f d -m GTRCAT -N [RAXML_NSEARCH] -p 42
-
rbs mode: assess the robustness of inference and construct a ML tree. This mode runs the rapid bootstrapping full analysis2. The bootstrap convergence criterion or the number of bootstrap searches can be specified with the parameter
--raxml_nboot
. The following parameters will be used:-f a -m GTRCAT -N [RAXML_NBOOT] -p 42 -x 43
Please refer to System requirements for the complete list of system requirements options.
1: Stamatakis A., Blagojevic F., Nikolopoulos D.S. et al. Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell. J VLSI Sign Process Syst Sign Im 48, 271–286 (2007). Link.
2: Stamatakis A., Hoover P., Rougemont J. A Rapid Bootstrap Algorithm for the RAxML Web Servers. Systematic Biology, Volume 57, Issue 5, October 2008, Pages 758–771, Link.