Skip to content

Latest commit

 

History

History
143 lines (102 loc) · 7.99 KB

README.md

File metadata and controls

143 lines (102 loc) · 7.99 KB

PoSeiDon

Docker command:

docker run --rm --user $(id -u):$(id -g) -it --mount type=bind,source=/your/favorite/poseidon/folder,target=/home/output lmfaber/poseidon:latest /bin/bash

Usage:

Usage: poseidon --input /absolute/input.fa --output /absoulute/output/path
Use poseidon --example for a test run.
    -o, --output=NAME                Output directory. (Absolute path)
    -i, --input=NAME                 Input multiple fasta file. (Absolute path)
    -t, --title=NAME                 Project title. (No spaces)
        --root-species=NAME          Comma separated list of root species. E.g. Escherichia_coli,
    -r, --reference-species=NAME     Reference species.
        --kh                         kh option?
        --timestamp=NAME             Timestamp
        --example                    Run a test example.
    -h, --help                       Prints this help

Here we present PoSeiDon, a pipeline to detect significant positively selected sites and possible recombination events in analignment of multiple coding sequences. Sites that undergo positive selection can give you insights in the evolutionary history of your sequences, for example showing you important mutation hot spots, accumulated as results of virus-host arms races during evolution.

We provide all ruby scripts needed to run the PoSeiDon pipeline.

Please note: we aimed that with these scripts the pipeline can be run out of the box, however, PoSeiDon relies on a variety of different third-party tools (see below). Binaries for most tools are also included in this repository (tools) and PoSeiDon assumes them to be located in this folder. The larger software package for HYPHY can be downloaded here directly and needs to be added and extracted manually to the tools folder:

Furthermore, you will need inkscape, pdflatex, ruby (tested with v2.4.2) and some ruby gems (packages) as well as mpirun (Open MPI; tested with v2.0.2). If you don't have anything of this installed, you can try on a Linux system:

apt-get install ruby
gem install bio
gem install mail
gem install encrypted_strings

apt-get install inkscape

apt-get install texlive-latex-base

apt-get install openmpi-bin

apt-get install hyphy-mpi

We heavily recommend to use our Docker image that can be easily executed without the need to install tools manually.:

docker run mhoelzer/poseidon <TODO>

Workflow of the PoSeiDon pipeline and example output

PoSeiDon workflow

The PoSeiDon pipeline comprises in-frame alignment of homologous protein-coding sequences, detection of putative recombination events and evolutionary breakpoints, phylogenetic reconstructions and detection of positively selected sites in the full alignment and all possible fragments. Finally, all results are combined and visualized in a user-friendly and clear HTML web page. The resulting alignment fragments are indicated with colored bars in the HTML output.

Please find an example output of the pipeline here. (Fuchs et al., 2017, Journal of Virology)

The PoSeiDon pipeline is based on the following tools and scripts:

  • TranslatorX (v1.1), Abascal et al. (2010); 20435676
  • Muscle (v3.8.31), Edgar (2004); 15034147
  • RAxML (v8.0.25), Stamatakis (2014); 24451623
  • Newick Utilities (v1.6), Junier and Zdobnov (2010); 20472542
  • MODELTEST , Posada and Crandall (1998); 9918953
  • HyPhy (v2.2), Pond et al. (2005); 15509596
  • GARD , Pond et al. (2006); 17110367
  • PaML/CodeML (v4.8), Yang (2007); 17483113
  • Ruby (v2.3.1)
  • Inkscape (v0.48.5)
  • pdfTeX (v3.14)

Parameters

Most of the PoSeiDon parameters are optional and are explained here in detail.

Input

Mandatory. Your input FASTA file must follow the format:

>Myotis_lucifugus Mx1 Gene
ATGGCGATCGAGATACGATACGTA...
>Myotis_davidii Mx1 Gene
ATGGCGGTCGAGATAAGATACGTT...

All sequences must have a correct open reading frame, are only allowed to contain nucleotide characters [A|C|G|T] and no internal stop codon.

Sequence IDs must be unique until the first occurrence of a space.

Reference

Optional. Default: use first sequence ID as reference. You can define one species ID from your multiple FASTA file as a reference species. Positively selected sites and corresponding amino acids will be drawn in respect to this species. The ID must match the FASTA header until the occurence of the first space. For example, if you want Myotis lucifugus as your reference species and your FASTA file contains:

ATGGCGATCGAGATACGATACGTA...

use

Myotis_lucifugus

as parameter to set the reference species. Per default the first ID occurring in the multiple FASTA file will be used.

Outgroup

Optional. Default: trees are unrooted. You can define one or multiple (comma separated) species IDs as outgroup. All phylogenetic trees will be rooted according to this species. For example, if your multiple FASTA file contains

ATGGCGATCGAGATACGATACGTA...
>Myotis_davidii Mx1 Gene
ATGGCGGTCGAGATAAGATACGTT...
>Pteropus_vampyrus Mx1 Gene
ATGGCCGTAGAGATTAGATACTTT...
>Eidolon_helvum Mx1 Gene
ATGCCCGTAGAGAATAGATACTTT...

you can define:

Pteropus_vampyrus,Eidolon_helvum

to root all trees in relation to this two species.

Use also insignificant breakpoints

Optional. Default: false. With this parameter you can decide if insignificant breakpoints should be taken into account. All breakpoints are tested for significant topological incongruence using a Kashino Hasegawa (KH) test [Kishino, H. and Hasegawa, M. (1989)]. KH-insignificant breakpoints most frequently arise from variation in branch lengths between segments. Nevertheless, taking KH-insignificant breakpoints into account could be interesting, because we already observed putative positively selected sites in fragments without any significant topological incongruence. KH-insignificant fragments are marked in the final output, as they might not occur from real recombination events.

Per default only significant breakpoints are used for further calculations.

Please also keep in mind that using also insignificant breakpoints can extend the run time of PoSeiDon from minutes to hours, depending on the number of detected breakpoints.

Use your own parameters

Currently, we don't provide full access to the parameters used within PoSeiDon through the web interface [the web serice is currently under maintenance due to web page changes]. In a future release, we will provide a local version of the pipeline for download including full access to the parameter settings of all executed tools. If you want to change parameters (e.g. for RAxML) now, just run the pipeline and PoSeiDon will also generate a 'Parameters' sub page (like this) in the final output, allowing access to all executed commands. With this, certain parts of the pipeline can be rerun locally using the provided commands and output files.