SATIVA (Semi-Automatic Taxonomy Improvement and Validation Algorithm) is a pipeline that uses Evolutionary Placement Algorithm (EPA, [1]) to identify taxonomically mislabeled sequences and suggest corrections. Internally, SATIVA relies on RAxML [2] for likelihood computations as well as on the ETE library[3] for tree topology manupulations in Python.
Currently, only Linux and OSX (Mac) systems are supported.
-
Make sure Python 2.6+ is installed (Python 3 is not supported!)
-
Make sure you have a recent C compiler (we recommend GCC 4.6+ / clang 3.3+ for AVX support). If you have an up-to-date OS distribution (Ubuntu 12.04+, OSX 10.8+ etc.), there is nothing to worry about. In a cluster environment, you might need to select an appropriate compiler version, e.g.:
module load gcc/4.7.0
(please refer to your cluster documentation for details)
-
Run the installation script
./install.sh
If you are getting compilation errors, try to disable AVX:
./install.sh --no-avx
SATIVA requires two files as an input: alignment (FASTA or PHYLIP) and a text file with taxonomic annotations (matched by sequence name). Furtermore, you must choose the nomenclature code via the -x option (e.g., BAC(teriological) for Bacteria and Archaea).
Sample command line to run SATIVA with 2 threads:
cd example
../sativa.py -s test.phy -t test.tax -x BAC -T 2
Output is a text file which contains a list of identified mislabels, along with the corresponding confidence scores and proposed taxonomic corrections.
Parallelization note: If you omit the -T
parameter, SATIVA will start one thread per each logical CPU
in your system. Although this is usually what you want, it might lead to a major slowdown
if some of the CPUs are already reserved by other running programs (e.g., if you run SATIVA on
a shared server). If you encounter this problem, please try reducing the number of threads with -T
!
Handling non-preferred synonyms: You can use -Y
parameter to specify a file with the list of
equivalent name groups (synonyms), e.g.:
cd example
../sativa.py -s test.phy -t test.tax -x BAC -T 2 -n syntest -Y synonym.txt
First name of each group will be considered the preferred synonym (primary name), and
will be used in place of all other (synonymous) names in the group.
An example synonym definition can be found in synonym.txt
file.
For additional options, please refer to the online help:
./sativa.py -h
SATIVA is integrated with the most recent (unstable) version of ARB software.
Development builds: ftp://ftp.arb-silva.de/ARB/builds/
Source: http://svn.mikro.biologie.tu-muenchen.de/readonly/trunk/
For the time being, please direct your questions to the RAxML google group:
https://groups.google.com/forum/?hl=en#!forum/raxml
Alexey M. Kozlov, Jiajie Zhang, Pelin Yilmaz, Frank Oliver Glöckner and Alexandros Stamatakis (2016) Phylogeny-aware Identification and Correction of Taxonomically Mislabeled Sequences. Nucleic Acids Research open access
[1] Berger, S. A., Krompass, D., and Stamatakis, A. (2011) Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood. Systematic Biology, 60(3), 291–302. doi:10.1093/sysbio/syr010
[2] Stamatakis A. (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9): 1312-1313. doi:10.1093/bioinformatics/btu033
[3] Huerta-Cepas, J., Dopazo, J., and Gabaldon, T. (2010) ETE: a python Environment for Tree Exploration. BMC bioinformatics, 11(1), 24. doi:10.1186/1471-2105-11-24