A python library for calculating delta score (Holland et al. 2002) and Q-Residual (Gray et al. 2010) for phylogenetic data.
Installation is only a pip install away:
pip install phylogemetric
Basic usage:
> phylogemetric
usage: phylogemetric [-h] method filename
Calculate delta score for filename example.nex:
> phylogemetric delta example.nex
taxon1 0.2453
taxon2 0.2404
taxon3 0.2954
...
Calculate qresidual score for filename example.nex:
> phylogemetric qresidual example.nex
taxon1 0.0030
taxon2 0.0037
taxon3 0.0063
...
Note: to save the results to a file use shell piping e.g.:
> phylogemetric qresidual example.nex > qresidual.txt
You can tell phylogemetric to use multiple cores with the -w/--workers
argument:
> phylogemetric -w 4 qresidual example.nex
Calculate scores:
from nexus import NexusReader
from phylogemetric import DeltaScoreMetric
from phylogemetric import QResidualMetric
# load data from a nexus file:
nex = NexusReader("filename.nex")
qres = QResidualMetric(nex.data.matrix)
# Or construct a data matrix directly:
matrix = {
'A': [
'1', '1', '1', '1', '0', '0', '1', '1', '1', '0', '1', '1',
'1', '1', '0', '0', '1', '1', '1', '0'
],
'B': [
'1', '1', '1', '1', '0', '0', '0', '1', '1', '1', '1', '1',
'1', '1', '1', '0', '0', '1', '1', '1'
],
'C': [
'1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '0',
'0', '0', '0', '1', '0', '1', '1', '1'
],
'D': [
'1', '0', '0', '0', '0', '1', '0', '1', '1', '1', '1', '0',
'0', '0', '0', '1', '0', '1', '1', '1'
],
'E': [
'1', '0', '0', '0', '0', '1', '0', '1', '0', '1', '1', '0',
'0', '0', '0', '1', '1', '1', '1', '1'
],
}
delta = DeltaScoreMetric(matrix)
Class Methods:
m = DeltaScoreMetric(matrix)
# calculates the number of quartets in the data:
m.nquartets()
# returns the distance between two sequences:
m.dist(['1', '1', '0'], ['0', '1', '0'])
# gets a dictionary of metric scores:
m.score()
m.score(workers=4) # with multiple processes.
# pretty prints the metric scores:
m.pprint()
- python-nexus >= 1.1
Currently phylogemetric is implemented in python, and the Delta/Q-Residual algorithms are O(n). This means
that performance is not optimal, and it may take a while to calculate these metrics for datasets with more than
100 taxa or so. To help speed this up, use the multiple processes argument -w/--workers
at the command line or by passing workers=n
to the score
function.
I hope to improve performance in the near future, but in the meantime, if this is an issue for you then try using the implementations available in SplitsTree.
If you use phylogemetric, please cite:
Greenhill, SJ. 2016. Phylogemetric: A Python library for calculating phylogenetic network metrics. Journal of Open Source Software.
http://dx.doi.org/10.21105/joss.00028
- 1.2.2: performance improvements
- 1.2.1: bug fix
- 1.2.0: performance improvements
- 1.1.0:
- Added support for multiple processes.
- Removed python 2 support.
- Thanks to David Bryant for clarifying the Q-Residual code.
- Thanks to Kristian Rother for code quality suggestions.