This is a manual for the implementation of calculating the Pagel likelihood ratios and running hierarchical clustering.
-
Pagel's Software: Bayestraits. Download the appropriate version for your system, and move the executable application 'BayestraitsV3' to the same folder as python scripts.
-
Python libraries:
- DendroPy Phylogenetic Computing Library
- Numpy, Pandas, Scipy
-
A csv file containing phylogenetic profiles (see below)
-
A phylogenetic tree file (see below).
A csv file ("," seperated) containing phylogenetic profiles is required. Each row is a profile for one gene across given genomes.
genome1 | genome2 | genome3 | genome4 | ... | |
---|---|---|---|---|---|
gene1 | 0 | 1 | 0 | 0 | ... |
gene2 | 0 | 0 | 0 | 1 | ... |
gene3 | 1 | 0 | 0 | 1 | ... |
gene4 | 1 | 1 | 0 | 1 | ... |
gene5 | 0 | 1 | 0 | 1 | ... |
... | ... | ... | ... | ... | ... |
A phylogenetic tree file (nexus format) is required. The tree format needs to satify the requirements of Pagel's Bayestraits software, which is that the taxa names must not be included in the descriptionof the tree but should be linked to a number in the translate section of the tree file. Details can be found in the Bayestraits mannual.
A BayestreeConverter, which is also provided by Pagel, can be used to generate the right tree format.
The command files are used to contain the commands of Bayestraits to run. Two default command files are included in the directory. commandfile0.txt is for independent evolution model and commandfile1.txt is for dependent evolution model.Details can be found in the Bayestrait mannual. The command files must be located in the same directory as python scripts.
-
arguments:
- -h : help
- -t : a phylogenetic tree
- -m : phylogenetic profile matrix
-
e.g.
python CalculateStatistics.py -t ./test/tree.trees -m ./test/profile.csv
-
output: LR_outputs.txt with columns: gene1, gene2, indepdent, dependent and likelihood ratio.
The LR_outputs.txt must be under the same directory.
-
arguments:
- h : help
- v : a numeric value - tHe therreshold to apply when forming clusterings.
- c : the criterion to use in forming clusters. See Details in the document for scipy.cluster.hierarchy.fcluster.
-
e.g.
python RunClustering.py -v 7 -c distance
-
output: Clustering_outputs.txt. Two columns in this file: the first column is the list of input genes; the second column is the list of corresponding cluster labels.
This project is licensed under the MIT License - see the LICENSE.md file for details