docker run --rm --user $(id -u):$(id -g) -it --mount type=bind,source=/your/favorite/poseidon/folder,target=/home/output lmfaber/poseidon:latest /bin/bash
Usage: poseidon --input /absolute/input.fa --output /absoulute/output/path
Use poseidon --example for a test run.
-o, --output=NAME Output directory. (Absolute path)
-i, --input=NAME Input multiple fasta file. (Absolute path)
-t, --title=NAME Project title. (No spaces)
--root-species=NAME Comma separated list of root species. E.g. Escherichia_coli,
-r, --reference-species=NAME Reference species.
--kh kh option?
--timestamp=NAME Timestamp
--example Run a test example.
-h, --help Prints this help
Here we present PoSeiDon, a pipeline to detect significant positively selected sites and possible recombination events in analignment of multiple coding sequences. Sites that undergo positive selection can give you insights in the evolutionary history of your sequences, for example showing you important mutation hot spots, accumulated as results of virus-host arms races during evolution.
We provide all ruby scripts needed to run the PoSeiDon pipeline.
Please note: we aimed that with these scripts the pipeline can be run out
of the box, however, PoSeiDon relies on a variety of different third-party
tools (see below). Binaries for most tools are also included in this repository
(tools
) and PoSeiDon assumes them to be located in this folder. The larger
software package for HYPHY can be downloaded here directly and needs to be added
and extracted manually to the tools
folder:
Furthermore, you will need inkscape, pdflatex, ruby (tested with v2.4.2) and some ruby gems (packages) as well as mpirun (Open MPI; tested with v2.0.2). If you don't have anything of this installed, you can try on a Linux system:
apt-get install ruby
gem install bio
gem install mail
gem install encrypted_strings
apt-get install inkscape
apt-get install texlive-latex-base
apt-get install openmpi-bin
apt-get install hyphy-mpi
We heavily recommend to use our Docker image that can be easily executed without the need to install tools manually.:
docker run mhoelzer/poseidon <TODO>
The PoSeiDon pipeline comprises in-frame alignment of homologous protein-coding sequences, detection of putative recombination events and evolutionary breakpoints, phylogenetic reconstructions and detection of positively selected sites in the full alignment and all possible fragments. Finally, all results are combined and visualized in a user-friendly and clear HTML web page. The resulting alignment fragments are indicated with colored bars in the HTML output.
Please find an example output of the pipeline here. (Fuchs et al., 2017, Journal of Virology)
- TranslatorX (v1.1), Abascal et al. (2010); 20435676
- Muscle (v3.8.31), Edgar (2004); 15034147
- RAxML (v8.0.25), Stamatakis (2014); 24451623
- Newick Utilities (v1.6), Junier and Zdobnov (2010); 20472542
- MODELTEST , Posada and Crandall (1998); 9918953
- HyPhy (v2.2), Pond et al. (2005); 15509596
- GARD , Pond et al. (2006); 17110367
- PaML/CodeML (v4.8), Yang (2007); 17483113
- Ruby (v2.3.1)
- Inkscape (v0.48.5)
- pdfTeX (v3.14)
Most of the PoSeiDon parameters are optional and are explained here in detail.
Mandatory. Your input FASTA file must follow the format:
>Myotis_lucifugus Mx1 Gene
ATGGCGATCGAGATACGATACGTA...
>Myotis_davidii Mx1 Gene
ATGGCGGTCGAGATAAGATACGTT...
All sequences must have a correct open reading frame, are only allowed to contain nucleotide characters [A|C|G|T] and no internal stop codon.
Sequence IDs must be unique until the first occurrence of a space.
Optional. Default: use first sequence ID as reference. You can define one species ID from your multiple FASTA file as a reference species. Positively selected sites and corresponding amino acids will be drawn in respect to this species. The ID must match the FASTA header until the occurence of the first space. For example, if you want Myotis lucifugus as your reference species and your FASTA file contains:
ATGGCGATCGAGATACGATACGTA...
use
Myotis_lucifugus
as parameter to set the reference species. Per default the first ID occurring in the multiple FASTA file will be used.
Optional. Default: trees are unrooted. You can define one or multiple (comma separated) species IDs as outgroup. All phylogenetic trees will be rooted according to this species. For example, if your multiple FASTA file contains
ATGGCGATCGAGATACGATACGTA...
>Myotis_davidii Mx1 Gene
ATGGCGGTCGAGATAAGATACGTT...
>Pteropus_vampyrus Mx1 Gene
ATGGCCGTAGAGATTAGATACTTT...
>Eidolon_helvum Mx1 Gene
ATGCCCGTAGAGAATAGATACTTT...
you can define:
Pteropus_vampyrus,Eidolon_helvum
to root all trees in relation to this two species.
Optional. Default: false. With this parameter you can decide if insignificant breakpoints should be taken into account. All breakpoints are tested for significant topological incongruence using a Kashino Hasegawa (KH) test [Kishino, H. and Hasegawa, M. (1989)]. KH-insignificant breakpoints most frequently arise from variation in branch lengths between segments. Nevertheless, taking KH-insignificant breakpoints into account could be interesting, because we already observed putative positively selected sites in fragments without any significant topological incongruence. KH-insignificant fragments are marked in the final output, as they might not occur from real recombination events.
Per default only significant breakpoints are used for further calculations.
Please also keep in mind that using also insignificant breakpoints can extend the run time of PoSeiDon from minutes to hours, depending on the number of detected breakpoints.
Currently, we don't provide full access to the parameters used within PoSeiDon through the web interface [the web serice is currently under maintenance due to web page changes]. In a future release, we will provide a local version of the pipeline for download including full access to the parameter settings of all executed tools. If you want to change parameters (e.g. for RAxML) now, just run the pipeline and PoSeiDon will also generate a 'Parameters' sub page (like this) in the final output, allowing access to all executed commands. With this, certain parts of the pipeline can be rerun locally using the provided commands and output files.