metashot/prok-pan is a workflow for pan genome analysis of closely related prokariotic genomes, mitochondria, and viruses.
- Input: prokaryotic genomes in FASTA format;
- Rapid prokaryotic genome annotation using prokka;
- Pan genome analysis and visualization using Roary;
- Phylogenetic tree inference (core genome) using RAxML, optional.
- Install Docker (or Singulariry) and Nextflow (see Dependences);
- Start running the analysis:
nextflow run metashot/prok-pan \
--genomes "data/*.fa" \
--outdir results
See the file nextflow.config
for the complete list of
parameters.
The files and directories listed below will be created in the results
directory after the pipeline has finished.
roary
: Roary output files. This folder includessummary_statistics.txt
(number of genes in the core and accessory),gene_presence_absence.csv
and the pangenome plots (pangenome_*.png
).
prokka
: the prokka output for each input sample;raxml
: RAxML output (when--skip_core_tree = false
).
If --skip_core_tree = false
the phylogenetic tree is inferred from the core
genome alignment using te default RaxML tree search
algorithm1 The following RAxML parameters will be used:
-f d -m GTRCAT -N [RAXML_NSEARCH]
Please refer to System requirements for the complete list of system requirements options.
1: Stamatakis A., Blagojevic F., Nikolopoulos D.S. et al. Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell. J VLSI Sign Process Syst Sign Im 48, 271–286 (2007). Link.