Skip to content

Commit

Permalink
Extensive changes aimed at speeding up tree loading & parameter
Browse files Browse the repository at this point in the history
optimisation (~2x) & tree search (~2x) by "serialising" their
memory access patterns to improve spatial and temporal locality
of reference.

1. PhyloTree alignment summaries may now be "borrowed" from another tree
   that has the same alignment (the relevant member is isSummaryBorrowed;
   if it is true, this instance doesn't own the summary, it is only
   "Borrowing" a reference to a summary owned by another tree).
2. PhyloTree member functions copyPhyloTree and copyPhyloTreeMixlen take an
   extra parameter indicating whether the copy is to "borrow" a copy of the
   alignment summary of the original (if it has one).  This matters a lot
   for ratefree.cpp and +R free rate models, and modelmixture.cpp!
   The temporary copies of the phylo tree that are used during parameter
   Optimization can now re-use the AlignmentSummary of the original; which
   means they can "linearise" their memory access to sites, when they are
   Optimising branch lengths (see changes listed below, e.g. #4, #5, #6, #7).
3. PhyloTree::setAlignment does its "name check" a different way (rather
   than finding each sequence by name by scanning the tree, if asks
   MTree::getMapOfTaxonNameToNode for a name to leaf node map, and checks
   the array of sequence names against the map (updating the id on the
   node for each hit).  The new approach is equivalent but is much faster,
   O(n.ln(n)) rather than O(n^2).  This speeds up tree loads markedly
   (particularly for large trees), but it matters most for free rate
   parameter optimization (on middling inputs this was a significant
   factor: about ~10% of parameter optimization time).  This can be turned
   off by changing the FAST_NAME_CHECK symbol.
4. IQTree::optimizeModelParameters now calls prepareToComputeDistances()
   (So that AlignmentSummary's matrix of converted sequences) will be
   available (to be borrowed, via calls to PhyloTree::copyPhyloTree (see
   change # 2 above, and changes #5 through #7 below).
   Likewise IQTree::doNNISearch (so changes #5, #8 help tree searches too).
5. AlignmentPairwise::computeFunction and  AlignmentPairwise::computeFuncDerv(
   can now make use of AlignmentSummary's "Matrix of converted sequences"
   (if it is available) via PhyloTree's accessor methods,
   e.g. PhyloTree::getConvertedSequenceByNumber().
   For this to work as expected, it's necessary for callers to ask
   AlignmentSummary to construct that matrix *including* even sites where
   there is no variety at all (added the keepBoringSites parameter on the
   AlignmentSummary constructor for this).
6. RateMeyerDiscrete::computeFunction and RateMeyerDiscrete::computeFuncDerv
   likewise. And RateMeyerDiscrete::normalizeRates can make use of the "flat"
   frequency array exposed by PhyloTree::getConvertedSequenceFrequencies() too.
7. PhyloTree::computePartialLikelihoodGenericSIMD (in phylokernelnew.h)
   makes use of the matrix of converted sequences (if one is available), in
   about six (!) different places.  In terms of actual effect, this is the
   most important change in this commit, but it needs changes #1, #2, and #4
   committed too, if it is to have any effect.  This change speeds up both
   parameter optimisation and tree searching significantly.
8. As well as inv_eigenvectors, there is now an iv_eigenvectors_transposed
   (Using the transpose makes for some faster multiplications; see change #9
   listed below). ModelMarkov::calculateSquareMatrixTranspose is used to
   calculate the transpose of the inverse eigen vectors.  Unpleasant
   consequence: ModelMarkov::update_eigen_pointers has to take an extra
   parameter.  Keeping this additional member set correctly is the only
   Thing that forced changes to modelpomomixture.cpp (and .h), modelset.cpp,
   and modelsubst.h.
9. ModelMarkov::computeTransMatrix and ModelMarkov::computeTransDerv
   now use (a) calculateExponentOfScalarMultiply and
   (b) aTimesDiagonalBTimesTransposeOfC to calculate transition matrices
   (This is quite a bit faster than the Eigen code, since it doesn't
   bother to construct the diagonal matrix B.asDiagonal()...).
   (a) and (b) and the supporting functions, calculateHadamardProduct
   And dotProduct, are (for now) members of ModelMarkov.
10.Minor tweaks to vector processing code in phylokernelnew.h:
   (a) dotProductVec hand-unrolled treatment of the V array;
   (b) dotProductPairAdd treated the last item (in A and B) as the
       special case, when handling an odd number of items.
       Possibly the treatment of the AD and BD arrays should be
       hand-unrolled here, too, but I haven't tried that yet.
   (c) dotProductTriple (checking for odd uses & rather than %) (faster!)
11.The aligned_free free function (from phylotree.h ?!) does the "pointer
   Null?" check itself, and (because it takes a T*& rather than a T*),
   can itself set the pointer to nullptr.  This means that client code that
   used to go... if (x) { aligned_free(x); x=NULL; } ... can now be
   simplified to just... aligned_free(x);
12.Next to it (in phylotree.h), there is now an ensure_aligned_allocated
   method.  That lets you replace code like ... this:
        if (!eigenvalues)
            eigenvalues = aligned_alloc<double>(num_states);
   With:
        ensure_aligned_allocated(eigenvalues,  num_states);
   which is,
   I reckon, more readable.
13.In many places where there was code of the form... if (x) { delete x; }
   I have replaced it with delete x (likewise delete [] x).  delete always
   checks for null (it's required to, that's in the C++ standards), and
   "Rolling your own check" merely devalues the check that delete will
   later do! I've made similar "don't bother to check for null" changes
   in some other files, that I haven't included in this commit (since there
   aren't any *material* changes to anything in those files).
  • Loading branch information
JamesBarbetti committed Jul 20, 2020
1 parent bb98482 commit 2a598de
Show file tree
Hide file tree
Showing 21 changed files with 1,130 additions and 614 deletions.
128 changes: 105 additions & 23 deletions alignment/alignmentpairwise.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,19 +97,22 @@ void AlignmentPairwise::setSequenceNumbers(int seq1, int seq2) {
memset(pair_freq, 0, sizeof(double)*total_size);
if (tree->hasMatrixOfConvertedSequences()
&& rate->getPtnCat(0) < 0 ) {
auto sequence1 = tree->getConvertedSequenceByNumber(seq1);
auto sequence2 = tree->getConvertedSequenceByNumber(seq2);
auto frequencies = tree->getConvertedSequenceFrequencies();
auto sequence1 = tree->getConvertedSequenceByNumber(seq1);
auto sequence2 = tree->getConvertedSequenceByNumber(seq2);
auto frequencies = tree->getConvertedSequenceFrequencies();
size_t sequenceLength = tree->getConvertedSequenceLength();
for (size_t i=0; i<sequenceLength; ++i) {
auto state1 = sequence1[i];
auto state2 = sequence2[i];
if ( state1 < num_states && state2 < num_states ) {
if ( state1 != STATE_UNKNOWN && state2 != STATE_UNKNOWN ) {
pair_freq[state1*num_states + state2] += frequencies[i];
//Todo: Would it be worth storing a multiplication table?!
// and using that instead of multiplying by num_states?
}
int state1 = sequence1[i];
if (num_states<=state1) {
continue;
}
auto pairRow = pair_freq + state1*num_states;
int state2 = sequence2[i];
if (num_states<=state2) {
continue;
}
if ( state1 != STATE_UNKNOWN && state2 != STATE_UNKNOWN ) {
pairRow[state2] += frequencies[i];
}
}
//Add back the cumulative frequencies for any sites
Expand Down Expand Up @@ -194,6 +197,36 @@ double AlignmentPairwise::computeFunction(double value) {
int nptn = tree->aln->getNPattern();
double lh = 0.0;

if (tree->hasMatrixOfConvertedSequences()) {
auto sequence1 = tree->getConvertedSequenceByNumber(seq_id1);
auto sequence2 = tree->getConvertedSequenceByNumber(seq_id2);
auto frequencies = tree->getConvertedSequenceFrequencies();
size_t sequenceLength = tree->getConvertedSequenceLength();

if (site_rate->isSiteSpecificRate()) {
for (int i = 0; i < sequenceLength; i++) {
int state1 = sequence1[i];
int state2 = sequence2[i];
if (state1 >= num_states || state2 >= num_states) {
continue;
}
double trans = tree->getModelFactory()->computeTrans(value * site_rate->getPtnRate(i), state1, state2);
lh -= log(trans) * frequencies[i];
}
return lh;
} else if (tree->getModel()->isSiteSpecificModel()) {
for (int i = 0; i < nptn; i++) {
int state1 = sequence1[i];
int state2 = sequence2[i];
if (state1 >= num_states || state2 >= num_states) {
continue;
}
double trans = tree->getModelFactory()->computeTrans(value * site_rate->getPtnRate(i), state1, state2);
lh -= log(trans) * frequencies[i];
}
return lh;
}
}
// site-specific rates
if (site_rate->isSiteSpecificRate()) {
for (int i = 0; i < nptn; i++) {
Expand All @@ -205,7 +238,6 @@ double AlignmentPairwise::computeFunction(double value) {
}
return lh;
}

if (tree->getModel()->isSiteSpecificModel()) {
for (int i = 0; i < nptn; i++) {
int state1 = tree->aln->at(i)[seq_id1];
Expand Down Expand Up @@ -259,35 +291,86 @@ void AlignmentPairwise::computeFuncDerv(double value, double &df, double &ddf) {
int nptn = tree->aln->getNPattern();
df = 0.0;
ddf = 0.0;

auto sequence1 = tree->getConvertedSequenceByNumber(seq_id1);
auto sequence2 = tree->getConvertedSequenceByNumber(seq_id2);
auto frequencies = tree->getConvertedSequenceFrequencies();
size_t sequenceLength = tree->getConvertedSequenceLength();
if (sequenceLength!=nptn) {
sequence1 = sequence2 = nullptr;
frequencies = nullptr;
}

if (site_rate->isSiteSpecificRate()) {
for (int i = 0; i < nptn; i++) {
int state1 = tree->aln->at(i)[seq_id1];
int state2 = tree->aln->at(i)[seq_id2];
if (state1 >= num_states || state2 >= num_states) continue;
int state1;
if (sequence1!=nullptr) {
state1 = sequence1[i];
} else {
state1 = tree->aln->at(i)[seq_id1];
}
if (num_states<=state1) {
continue;
}
int state2;
if (sequence2!=nullptr) {
state2 = sequence2[i];
} else {
state2 = tree->aln->at(i)[seq_id2];
}
if (num_states<=state2) {
continue;
}
double rate_val = site_rate->getPtnRate(i);
double rate_sqr = rate_val * rate_val;
double derv1, derv2;
double trans = tree->getModelFactory()->computeTrans(value * rate_val, state1, state2, derv1, derv2);
double d1 = derv1 / trans;
df -= rate_val * d1 * tree->aln->at(i).frequency;
ddf -= rate_sqr * (derv2/trans - d1*d1) * tree->aln->at(i).frequency;
double freq;
if (frequencies!=nullptr) {
freq = frequencies[i];
} else {
freq = tree->aln->at(i).frequency;
}
df -= rate_val * d1 * freq;
ddf -= rate_sqr * (derv2/trans - d1*d1) * freq;
}
return;
}

if (tree->getModel()->isSiteSpecificModel()) {
for (int i = 0; i < nptn; i++) {
int state1 = tree->aln->at(i)[seq_id1];
int state2 = tree->aln->at(i)[seq_id2];
if (state1 >= num_states || state2 >= num_states) continue;
int state1;
if (sequence1!=nullptr) {
state1 = sequence1[i]; }
else {
state1 = tree->aln->at(i)[seq_id1];
}
if (num_states<=state1) {
continue;
}
int state2;
if (sequence2!=nullptr) {
state2 = sequence2[i];
} else {
state2 = tree->aln->at(i)[seq_id2];
}
if (num_states<=state2) {
continue;
}
double rate_val = site_rate->getPtnRate(i);
double rate_sqr = rate_val * rate_val;
double derv1, derv2;
double trans = tree->getModel()->computeTrans(value * rate_val,model->getPtnModelID(i), state1, state2, derv1, derv2);
double d1 = derv1 / trans;
df -= rate_val * d1 * tree->aln->at(i).frequency;
ddf -= rate_sqr * (derv2/trans - d1*d1) * tree->aln->at(i).frequency;
double freq;
if (frequencies!=nullptr) {
freq = frequencies[i];
} else {
freq = tree->aln->at(i).frequency;
}
df -= rate_val * d1 * freq;
ddf -= rate_sqr * (derv2/trans - d1*d1) * freq;
}
return;
}
Expand Down Expand Up @@ -386,7 +469,6 @@ double AlignmentPairwise::recomputeDist
( int seq1, int seq2, double initial_dist, double &d2l ) {
//Only called when -experimental has been passed
if (initial_dist == 0.0) {
//ZORK
if (tree->hasMatrixOfConvertedSequences()) {
int distance = 0;
int denominator = 0;
Expand Down
8 changes: 5 additions & 3 deletions alignment/alignmentsummary.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
#include "alignment.h"
#include "alignmentsummary.h"

AlignmentSummary::AlignmentSummary(const Alignment* a, bool keepConstSites) {
AlignmentSummary::AlignmentSummary(const Alignment* a
, bool keepConstSites
, bool keepBoringSites) {
alignment = a;
sequenceMatrix = nullptr;
sequenceCount = a->getNSeq();
Expand Down Expand Up @@ -56,14 +58,14 @@ AlignmentSummary::AlignmentSummary(const Alignment* a, bool keepConstSites) {
s.maxState = maxStateForSite;
}
sequenceLength = 0; //Number sites where there's some variability
std::map<int, int> map = stateToSumOfConstantSiteFrequencies;
std::map<int, int> & map = stateToSumOfConstantSiteFrequencies;
for (size_t site=0; site<siteCount; ++site) {
SiteSummary &s = sites[site];
bool alreadyCounted = false;
totalFrequency += s.frequency;
totalFrequencyOfNonConstSites += s.isConst ? 0 : s.frequency;
if ( keepConstSites || !s.isConst) {
if ( 0 < s.frequency && s.minState < s.maxState) {
if ( keepBoringSites || ( 0 < s.frequency && s.minState < s.maxState) ) {
alreadyCounted = true;
if (sequenceLength==0) {
minState = s.minState;
Expand Down
2 changes: 1 addition & 1 deletion alignment/alignmentsummary.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class Alignment;
struct AlignmentSummary
{
public:
AlignmentSummary(const Alignment* a, bool keepConstSites);
AlignmentSummary(const Alignment* a, bool keepConstSites, bool keepBoringSites);
~AlignmentSummary();
const Alignment* alignment;
std::vector<int> siteNumbers; //of sites with variation
Expand Down
4 changes: 2 additions & 2 deletions main/phyloanalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1637,7 +1637,7 @@ void computeMLDist ( Params& params, IQTree& iqtree
longest_dist = iqtree.computeDist(params, iqtree.aln, ml_dist, ml_var);
cout << "Computing ML distances took "
<< (getRealTime() - begin_wallclock_time) << " sec (of wall-clock time) "
<< (getCPUTime() - begin_cpu_time) << " sec(of CPU time)" << endl;
<< (getCPUTime() - begin_cpu_time) << " sec (of CPU time)" << endl;
size_t n = iqtree.aln->getNSeq();
size_t nSquared = n*n;
if ( iqtree.dist_matrix == nullptr ) {
Expand Down Expand Up @@ -2226,7 +2226,7 @@ void runTreeReconstruction(Params &params, IQTree* &iqtree) {
iqtree_new->initializePLL(params);
}
iqtree_new->setParams(&params);
iqtree_new->copyPhyloTree(iqtree);
iqtree_new->copyPhyloTree(iqtree, false);

// replace iqtree object
delete iqtree;
Expand Down
5 changes: 1 addition & 4 deletions model/modelfactorymixlen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,14 +125,11 @@ string ModelFactoryMixlen::sortClassesByTreeLength() {
((ModelMarkov*)model->getMixtureClass(m))->setEigenvalues(&model->getEigenvalues()[m*num_states]);
((ModelMarkov*)model->getMixtureClass(m))->setEigenvectors(&model->getEigenvectors()[m*num_states*num_states]);
((ModelMarkov*)model->getMixtureClass(m))->setInverseEigenvectors(&model->getInverseEigenvectors()[m*num_states*num_states]);
((ModelMarkov*)model->getMixtureClass(m))->setInverseEigenvectorsTransposed(&model->getInverseEigenvectorsTransposed()[m*num_states*num_states]);
}
model->decomposeRateMatrix();

// model->writeInfo(cout);
site_rate->writeInfo(cout);

}

tree->clearAllPartialLH();
ASSERT(fabs(score - tree->computeLikelihood()) < 0.1);
}
Expand Down
Loading

0 comments on commit 2a598de

Please sign in to comment.