From 2a598de11a3cf18e2f9e43d3bbb849e30003d9fa Mon Sep 17 00:00:00 2001 From: James Barbetti Date: Tue, 21 Jul 2020 09:09:38 +1000 Subject: [PATCH] Extensive changes aimed at speeding up tree loading & parameter optimisation (~2x) & tree search (~2x) by "serialising" their memory access patterns to improve spatial and temporal locality of reference. 1. PhyloTree alignment summaries may now be "borrowed" from another tree that has the same alignment (the relevant member is isSummaryBorrowed; if it is true, this instance doesn't own the summary, it is only "Borrowing" a reference to a summary owned by another tree). 2. PhyloTree member functions copyPhyloTree and copyPhyloTreeMixlen take an extra parameter indicating whether the copy is to "borrow" a copy of the alignment summary of the original (if it has one). This matters a lot for ratefree.cpp and +R free rate models, and modelmixture.cpp! The temporary copies of the phylo tree that are used during parameter Optimization can now re-use the AlignmentSummary of the original; which means they can "linearise" their memory access to sites, when they are Optimising branch lengths (see changes listed below, e.g. #4, #5, #6, #7). 3. PhyloTree::setAlignment does its "name check" a different way (rather than finding each sequence by name by scanning the tree, if asks MTree::getMapOfTaxonNameToNode for a name to leaf node map, and checks the array of sequence names against the map (updating the id on the node for each hit). The new approach is equivalent but is much faster, O(n.ln(n)) rather than O(n^2). This speeds up tree loads markedly (particularly for large trees), but it matters most for free rate parameter optimization (on middling inputs this was a significant factor: about ~10% of parameter optimization time). This can be turned off by changing the FAST_NAME_CHECK symbol. 4. IQTree::optimizeModelParameters now calls prepareToComputeDistances() (So that AlignmentSummary's matrix of converted sequences) will be available (to be borrowed, via calls to PhyloTree::copyPhyloTree (see change # 2 above, and changes #5 through #7 below). Likewise IQTree::doNNISearch (so changes #5, #8 help tree searches too). 5. AlignmentPairwise::computeFunction and AlignmentPairwise::computeFuncDerv( can now make use of AlignmentSummary's "Matrix of converted sequences" (if it is available) via PhyloTree's accessor methods, e.g. PhyloTree::getConvertedSequenceByNumber(). For this to work as expected, it's necessary for callers to ask AlignmentSummary to construct that matrix *including* even sites where there is no variety at all (added the keepBoringSites parameter on the AlignmentSummary constructor for this). 6. RateMeyerDiscrete::computeFunction and RateMeyerDiscrete::computeFuncDerv likewise. And RateMeyerDiscrete::normalizeRates can make use of the "flat" frequency array exposed by PhyloTree::getConvertedSequenceFrequencies() too. 7. PhyloTree::computePartialLikelihoodGenericSIMD (in phylokernelnew.h) makes use of the matrix of converted sequences (if one is available), in about six (!) different places. In terms of actual effect, this is the most important change in this commit, but it needs changes #1, #2, and #4 committed too, if it is to have any effect. This change speeds up both parameter optimisation and tree searching significantly. 8. As well as inv_eigenvectors, there is now an iv_eigenvectors_transposed (Using the transpose makes for some faster multiplications; see change #9 listed below). ModelMarkov::calculateSquareMatrixTranspose is used to calculate the transpose of the inverse eigen vectors. Unpleasant consequence: ModelMarkov::update_eigen_pointers has to take an extra parameter. Keeping this additional member set correctly is the only Thing that forced changes to modelpomomixture.cpp (and .h), modelset.cpp, and modelsubst.h. 9. ModelMarkov::computeTransMatrix and ModelMarkov::computeTransDerv now use (a) calculateExponentOfScalarMultiply and (b) aTimesDiagonalBTimesTransposeOfC to calculate transition matrices (This is quite a bit faster than the Eigen code, since it doesn't bother to construct the diagonal matrix B.asDiagonal()...). (a) and (b) and the supporting functions, calculateHadamardProduct And dotProduct, are (for now) members of ModelMarkov. 10.Minor tweaks to vector processing code in phylokernelnew.h: (a) dotProductVec hand-unrolled treatment of the V array; (b) dotProductPairAdd treated the last item (in A and B) as the special case, when handling an odd number of items. Possibly the treatment of the AD and BD arrays should be hand-unrolled here, too, but I haven't tried that yet. (c) dotProductTriple (checking for odd uses & rather than %) (faster!) 11.The aligned_free free function (from phylotree.h ?!) does the "pointer Null?" check itself, and (because it takes a T*& rather than a T*), can itself set the pointer to nullptr. This means that client code that used to go... if (x) { aligned_free(x); x=NULL; } ... can now be simplified to just... aligned_free(x); 12.Next to it (in phylotree.h), there is now an ensure_aligned_allocated method. That lets you replace code like ... this: if (!eigenvalues) eigenvalues = aligned_alloc(num_states); With: ensure_aligned_allocated(eigenvalues, num_states); which is, I reckon, more readable. 13.In many places where there was code of the form... if (x) { delete x; } I have replaced it with delete x (likewise delete [] x). delete always checks for null (it's required to, that's in the C++ standards), and "Rolling your own check" merely devalues the check that delete will later do! I've made similar "don't bother to check for null" changes in some other files, that I haven't included in this commit (since there aren't any *material* changes to anything in those files). --- alignment/alignmentpairwise.cpp | 128 +++++++-- alignment/alignmentsummary.cpp | 8 +- alignment/alignmentsummary.h | 2 +- main/phyloanalysis.cpp | 4 +- model/modelfactorymixlen.cpp | 5 +- model/modelmarkov.cpp | 449 ++++++++++++++++++++++---------- model/modelmarkov.h | 31 ++- model/modelmixture.cpp | 137 +++++----- model/modelpomomixture.cpp | 34 ++- model/modelpomomixture.h | 3 +- model/modelset.cpp | 102 ++++---- model/modelsubst.h | 7 +- model/ratefree.cpp | 6 +- model/ratemeyerdiscrete.cpp | 156 +++++++---- tree/iqtree.cpp | 7 +- tree/mtree.cpp | 23 +- tree/mtree.h | 11 + tree/phylokernelnew.h | 332 +++++++++++++++-------- tree/phylotree.cpp | 248 +++++++++--------- tree/phylotree.h | 26 +- tree/phylotreesse.cpp | 25 +- 21 files changed, 1130 insertions(+), 614 deletions(-) diff --git a/alignment/alignmentpairwise.cpp b/alignment/alignmentpairwise.cpp index f3ea23bcb..bfa692475 100644 --- a/alignment/alignmentpairwise.cpp +++ b/alignment/alignmentpairwise.cpp @@ -97,19 +97,22 @@ void AlignmentPairwise::setSequenceNumbers(int seq1, int seq2) { memset(pair_freq, 0, sizeof(double)*total_size); if (tree->hasMatrixOfConvertedSequences() && rate->getPtnCat(0) < 0 ) { - auto sequence1 = tree->getConvertedSequenceByNumber(seq1); - auto sequence2 = tree->getConvertedSequenceByNumber(seq2); - auto frequencies = tree->getConvertedSequenceFrequencies(); + auto sequence1 = tree->getConvertedSequenceByNumber(seq1); + auto sequence2 = tree->getConvertedSequenceByNumber(seq2); + auto frequencies = tree->getConvertedSequenceFrequencies(); size_t sequenceLength = tree->getConvertedSequenceLength(); for (size_t i=0; ialn->getNPattern(); double lh = 0.0; + if (tree->hasMatrixOfConvertedSequences()) { + auto sequence1 = tree->getConvertedSequenceByNumber(seq_id1); + auto sequence2 = tree->getConvertedSequenceByNumber(seq_id2); + auto frequencies = tree->getConvertedSequenceFrequencies(); + size_t sequenceLength = tree->getConvertedSequenceLength(); + + if (site_rate->isSiteSpecificRate()) { + for (int i = 0; i < sequenceLength; i++) { + int state1 = sequence1[i]; + int state2 = sequence2[i]; + if (state1 >= num_states || state2 >= num_states) { + continue; + } + double trans = tree->getModelFactory()->computeTrans(value * site_rate->getPtnRate(i), state1, state2); + lh -= log(trans) * frequencies[i]; + } + return lh; + } else if (tree->getModel()->isSiteSpecificModel()) { + for (int i = 0; i < nptn; i++) { + int state1 = sequence1[i]; + int state2 = sequence2[i]; + if (state1 >= num_states || state2 >= num_states) { + continue; + } + double trans = tree->getModelFactory()->computeTrans(value * site_rate->getPtnRate(i), state1, state2); + lh -= log(trans) * frequencies[i]; + } + return lh; + } + } // site-specific rates if (site_rate->isSiteSpecificRate()) { for (int i = 0; i < nptn; i++) { @@ -205,7 +238,6 @@ double AlignmentPairwise::computeFunction(double value) { } return lh; } - if (tree->getModel()->isSiteSpecificModel()) { for (int i = 0; i < nptn; i++) { int state1 = tree->aln->at(i)[seq_id1]; @@ -259,35 +291,86 @@ void AlignmentPairwise::computeFuncDerv(double value, double &df, double &ddf) { int nptn = tree->aln->getNPattern(); df = 0.0; ddf = 0.0; + + auto sequence1 = tree->getConvertedSequenceByNumber(seq_id1); + auto sequence2 = tree->getConvertedSequenceByNumber(seq_id2); + auto frequencies = tree->getConvertedSequenceFrequencies(); + size_t sequenceLength = tree->getConvertedSequenceLength(); + if (sequenceLength!=nptn) { + sequence1 = sequence2 = nullptr; + frequencies = nullptr; + } if (site_rate->isSiteSpecificRate()) { for (int i = 0; i < nptn; i++) { - int state1 = tree->aln->at(i)[seq_id1]; - int state2 = tree->aln->at(i)[seq_id2]; - if (state1 >= num_states || state2 >= num_states) continue; + int state1; + if (sequence1!=nullptr) { + state1 = sequence1[i]; + } else { + state1 = tree->aln->at(i)[seq_id1]; + } + if (num_states<=state1) { + continue; + } + int state2; + if (sequence2!=nullptr) { + state2 = sequence2[i]; + } else { + state2 = tree->aln->at(i)[seq_id2]; + } + if (num_states<=state2) { + continue; + } double rate_val = site_rate->getPtnRate(i); double rate_sqr = rate_val * rate_val; double derv1, derv2; double trans = tree->getModelFactory()->computeTrans(value * rate_val, state1, state2, derv1, derv2); double d1 = derv1 / trans; - df -= rate_val * d1 * tree->aln->at(i).frequency; - ddf -= rate_sqr * (derv2/trans - d1*d1) * tree->aln->at(i).frequency; + double freq; + if (frequencies!=nullptr) { + freq = frequencies[i]; + } else { + freq = tree->aln->at(i).frequency; + } + df -= rate_val * d1 * freq; + ddf -= rate_sqr * (derv2/trans - d1*d1) * freq; } return; } if (tree->getModel()->isSiteSpecificModel()) { for (int i = 0; i < nptn; i++) { - int state1 = tree->aln->at(i)[seq_id1]; - int state2 = tree->aln->at(i)[seq_id2]; - if (state1 >= num_states || state2 >= num_states) continue; + int state1; + if (sequence1!=nullptr) { + state1 = sequence1[i]; } + else { + state1 = tree->aln->at(i)[seq_id1]; + } + if (num_states<=state1) { + continue; + } + int state2; + if (sequence2!=nullptr) { + state2 = sequence2[i]; + } else { + state2 = tree->aln->at(i)[seq_id2]; + } + if (num_states<=state2) { + continue; + } double rate_val = site_rate->getPtnRate(i); double rate_sqr = rate_val * rate_val; double derv1, derv2; double trans = tree->getModel()->computeTrans(value * rate_val,model->getPtnModelID(i), state1, state2, derv1, derv2); double d1 = derv1 / trans; - df -= rate_val * d1 * tree->aln->at(i).frequency; - ddf -= rate_sqr * (derv2/trans - d1*d1) * tree->aln->at(i).frequency; + double freq; + if (frequencies!=nullptr) { + freq = frequencies[i]; + } else { + freq = tree->aln->at(i).frequency; + } + df -= rate_val * d1 * freq; + ddf -= rate_sqr * (derv2/trans - d1*d1) * freq; } return; } @@ -386,7 +469,6 @@ double AlignmentPairwise::recomputeDist ( int seq1, int seq2, double initial_dist, double &d2l ) { //Only called when -experimental has been passed if (initial_dist == 0.0) { - //ZORK if (tree->hasMatrixOfConvertedSequences()) { int distance = 0; int denominator = 0; diff --git a/alignment/alignmentsummary.cpp b/alignment/alignmentsummary.cpp index ae7b0684a..8136e60c4 100644 --- a/alignment/alignmentsummary.cpp +++ b/alignment/alignmentsummary.cpp @@ -8,7 +8,9 @@ #include "alignment.h" #include "alignmentsummary.h" -AlignmentSummary::AlignmentSummary(const Alignment* a, bool keepConstSites) { +AlignmentSummary::AlignmentSummary(const Alignment* a + , bool keepConstSites + , bool keepBoringSites) { alignment = a; sequenceMatrix = nullptr; sequenceCount = a->getNSeq(); @@ -56,14 +58,14 @@ AlignmentSummary::AlignmentSummary(const Alignment* a, bool keepConstSites) { s.maxState = maxStateForSite; } sequenceLength = 0; //Number sites where there's some variability - std::map map = stateToSumOfConstantSiteFrequencies; + std::map & map = stateToSumOfConstantSiteFrequencies; for (size_t site=0; site siteNumbers; //of sites with variation diff --git a/main/phyloanalysis.cpp b/main/phyloanalysis.cpp index c8cfe3612..4aa67bdfd 100644 --- a/main/phyloanalysis.cpp +++ b/main/phyloanalysis.cpp @@ -1637,7 +1637,7 @@ void computeMLDist ( Params& params, IQTree& iqtree longest_dist = iqtree.computeDist(params, iqtree.aln, ml_dist, ml_var); cout << "Computing ML distances took " << (getRealTime() - begin_wallclock_time) << " sec (of wall-clock time) " - << (getCPUTime() - begin_cpu_time) << " sec(of CPU time)" << endl; + << (getCPUTime() - begin_cpu_time) << " sec (of CPU time)" << endl; size_t n = iqtree.aln->getNSeq(); size_t nSquared = n*n; if ( iqtree.dist_matrix == nullptr ) { @@ -2226,7 +2226,7 @@ void runTreeReconstruction(Params ¶ms, IQTree* &iqtree) { iqtree_new->initializePLL(params); } iqtree_new->setParams(¶ms); - iqtree_new->copyPhyloTree(iqtree); + iqtree_new->copyPhyloTree(iqtree, false); // replace iqtree object delete iqtree; diff --git a/model/modelfactorymixlen.cpp b/model/modelfactorymixlen.cpp index 2350c12eb..799394a19 100644 --- a/model/modelfactorymixlen.cpp +++ b/model/modelfactorymixlen.cpp @@ -125,14 +125,11 @@ string ModelFactoryMixlen::sortClassesByTreeLength() { ((ModelMarkov*)model->getMixtureClass(m))->setEigenvalues(&model->getEigenvalues()[m*num_states]); ((ModelMarkov*)model->getMixtureClass(m))->setEigenvectors(&model->getEigenvectors()[m*num_states*num_states]); ((ModelMarkov*)model->getMixtureClass(m))->setInverseEigenvectors(&model->getInverseEigenvectors()[m*num_states*num_states]); + ((ModelMarkov*)model->getMixtureClass(m))->setInverseEigenvectorsTransposed(&model->getInverseEigenvectorsTransposed()[m*num_states*num_states]); } model->decomposeRateMatrix(); - -// model->writeInfo(cout); site_rate->writeInfo(cout); - } - tree->clearAllPartialLH(); ASSERT(fabs(score - tree->computeLikelihood()) < 0.1); } diff --git a/model/modelmarkov.cpp b/model/modelmarkov.cpp index b3e0b0d42..8e262a2d2 100644 --- a/model/modelmarkov.cpp +++ b/model/modelmarkov.cpp @@ -22,10 +22,13 @@ #include #include "modelliemarkov.h" #include "modelunrest.h" + #include #include using namespace Eigen; +#include +#include /** number of squaring for scaling-squaring technique */ //const int TimeSquare = 10; @@ -44,14 +47,16 @@ ModelMarkov::ModelMarkov(PhyloTree *tree, bool reversible, bool adapt_tree) rates = NULL; // variables for reversible model - eigenvalues = eigenvectors = inv_eigenvectors = NULL; + eigenvalues = nullptr; + eigenvectors = nullptr; + inv_eigenvectors = nullptr; + inv_eigenvectors_transposed = nullptr; highest_freq_state = num_states-1; freq_type = FREQ_UNKNOWN; half_matrix = true; highest_freq_state = num_states-1; // variables for non-reversible model -// model_parameters = NULL; rate_matrix = NULL; eigenvalues_imag = NULL; ceval = cevec = cinv_evec = NULL; @@ -68,9 +73,7 @@ ModelMarkov::ModelMarkov(PhyloTree *tree, bool reversible, bool adapt_tree) } void ModelMarkov::setReversible(bool reversible, bool adapt_tree) { - bool old_reversible = is_reversible; - is_reversible = reversible; if (reversible) { @@ -78,24 +81,23 @@ void ModelMarkov::setReversible(bool reversible, bool adapt_tree) { int i; int nrate = getNumRateEntries(); - if (rates) - delete [] rates; + delete [] rates; rates = new double[nrate]; - for (i=0; i < nrate; i++) + for (i=0; i < nrate; i++) { rates[i] = 1.0; - - if (!eigenvalues) - eigenvalues = aligned_alloc(num_states); - if (!eigenvectors) - eigenvectors = aligned_alloc(num_states*num_states); - if (!inv_eigenvectors) - inv_eigenvectors = aligned_alloc(num_states*num_states); - + } + size_t num_states_squared = num_states * num_states; + ensure_aligned_allocated(eigenvalues, num_states); + ensure_aligned_allocated(eigenvectors, num_states_squared); + ensure_aligned_allocated(inv_eigenvectors, num_states_squared); + ensure_aligned_allocated(inv_eigenvectors_transposed, num_states_squared); + num_params = nrate - 1; if (adapt_tree && phylo_tree && phylo_tree->rooted) { - if (verbose_mode >= VB_MED) + if (verbose_mode >= VB_MED) { cout << "Converting rooted to unrooted tree..." << endl; + } phylo_tree->convertToUnrooted(); } } else { @@ -109,24 +111,27 @@ void ModelMarkov::setReversible(bool reversible, bool adapt_tree) { // reallocate the mem spaces if (rates && old_reversible) { // copy old reversible rates into new non-reversible - int i, j, k; - for (i = 0, k = 0; i < num_states; i++) - for (j = i+1; j < num_states; j++, k++) { + for (int i = 0, k = 0; i < num_states; i++) { + for (int j = i+1; j < num_states; j++, k++) { rate_matrix[i*num_states+j] = rates[k] * state_freq[j]; rate_matrix[j*num_states+i] = rates[k] * state_freq[i]; } + } delete [] rates; rates = new double[num_rates]; - for (i = 0, k = 0; i < num_states; i++) - for (j = 0; j < num_states; j++) + int k = 0; + int pos = 0; + for (int i = 0; i < num_states; i++) { + for (int j = 0; j < num_states; j++, pos++) { if (i!=j) { - rates[k] = rate_matrix[i*num_states+j]; - k++; + rates[k] = rate_matrix[pos]; + ++k; } + } + } ASSERT(k == num_rates); } else { - if (rates) - delete [] rates; + delete [] rates; rates = new double [num_rates]; memset(rates, 0, sizeof(double) * (num_rates)); } @@ -477,16 +482,23 @@ void ModelMarkov::computeTransMatrix(double time, double *trans_matrix, int mixt /* compute P(t) */ double evol_time = time / total_num_subst; - VectorXd eval_exp(num_states); - ArrayXd eval = Map(eigenvalues, num_states); - eval_exp = (eval*evol_time).exp().matrix(); - Map,Aligned> evectors(eigenvectors, num_states, num_states); - Map,Aligned> inv_evectors(inv_eigenvectors, num_states, num_states); - MatrixXd res = evectors * eval_exp.asDiagonal() * inv_evectors; - Map >map_trans(trans_matrix,num_states,num_states); - map_trans = res; - return; - + if (Params::getInstance().experimental) { + double eval_exp[num_states]; + calculateExponentOfScalarMultiply(eigenvalues, num_states, evol_time, eval_exp); + aTimesDiagonalBTimesTransposeOfC( eigenvectors, eval_exp + , inv_eigenvectors_transposed, num_states, trans_matrix); + return; + } else { + VectorXd eval_exp(num_states); + ArrayXd eval = Map(eigenvalues, num_states); + eval_exp = (eval*evol_time).exp().matrix(); + Map,Aligned> evectors(eigenvectors, num_states, num_states); + Map,Aligned> inv_evectors(inv_eigenvectors, num_states, num_states); + MatrixXd res = evectors * eval_exp.asDiagonal() * inv_evectors; + Map >map_trans(trans_matrix,num_states,num_states); + map_trans = res; + return; + } /* double exptime[num_states]; int i, j, k; @@ -543,23 +555,167 @@ double ModelMarkov::computeTrans(double time, int state1, int state2) { } double ModelMarkov::computeTrans(double time, int state1, int state2, double &derv1, double &derv2) { - double evol_time = time / total_num_subst; - int i; + double evol_time = time / total_num_subst; + double trans_prob = 0.0; + derv1 = derv2 = 0.0; + for (int i = 0; i < num_states; i++) { + double trans = eigenvectors[state1*num_states+i] + * inv_eigenvectors[i*num_states+state2] + * exp(evol_time * eigenvalues[i]); + double trans2 = trans * eigenvalues[i]; + trans_prob += trans; + derv1 += trans2; + derv2 += trans2 * eigenvalues[i]; + } + return trans_prob; +} -// double *coeff_entry = eigen_coeff + ((state1*num_states+state2)*num_states); - double trans_prob = 0.0; - derv1 = derv2 = 0.0; - for (i = 0; i < num_states; i++) { - double trans = eigenvectors[state1*num_states+i] * inv_eigenvectors[i*num_states+state2] * exp(evol_time * eigenvalues[i]); - double trans2 = trans * eigenvalues[i]; - trans_prob += trans; - derv1 += trans2; - derv2 += trans2 * eigenvalues[i]; - } - return trans_prob; +void ModelMarkov::calculateExponentOfScalarMultiply(const double* source, int size + , double scalar, double* dest) { + if (size==4) { + Vec4d v; + v.load(source); + exp(v*scalar).store(dest); + return; + } + int remainder = size; + if (4(eigenvalues, num_states); - ArrayXd eval_exp = (eval*evol_time).exp(); - ArrayXd eval_exp_derv1 = eval_exp*eval; - ArrayXd eval_exp_derv2 = eval_exp_derv1*eval; - - Map,Aligned> evectors(eigenvectors, num_states, num_states); - Map,Aligned> inv_evectors(inv_eigenvectors, num_states, num_states); - MatrixXd res = evectors * eval_exp.matrix().asDiagonal() * inv_evectors; - Map >map_trans(trans_matrix,num_states,num_states); - map_trans = res; - - res = evectors * eval_exp_derv1.matrix().asDiagonal() * inv_evectors; - Map >map_derv1(trans_derv1,num_states,num_states); - map_derv1 = res; - - res = evectors * eval_exp_derv2.matrix().asDiagonal() * inv_evectors; - Map >map_derv2(trans_derv2,num_states,num_states); - map_derv2 = res; - + if (Params::getInstance().experimental) { + //James' version + double eval_exp[num_states]; + calculateExponentOfScalarMultiply(eigenvalues, num_states, evol_time, eval_exp); + aTimesDiagonalBTimesTransposeOfC( eigenvectors, eval_exp + , inv_eigenvectors_transposed, num_states, trans_matrix); + + double eval_exp_derv1[num_states]; + calculateHadamardProduct(eigenvalues, eval_exp, num_states, eval_exp_derv1); + aTimesDiagonalBTimesTransposeOfC( eigenvectors, eval_exp_derv1 + , inv_eigenvectors_transposed, num_states, trans_derv1); + + double eval_exp_derv2[num_states]; + calculateHadamardProduct(eigenvalues, eval_exp_derv1, num_states, eval_exp_derv2); + aTimesDiagonalBTimesTransposeOfC( eigenvectors, eval_exp_derv2 + , inv_eigenvectors_transposed, num_states, trans_derv2); + } + else + { + //EIGEN version + ArrayXd eval = Map(eigenvalues, num_states); + ArrayXd eval_exp = (eval*evol_time).exp(); + ArrayXd eval_exp_derv1 = eval_exp*eval; + ArrayXd eval_exp_derv2 = eval_exp_derv1*eval; + Map,Aligned> evectors(eigenvectors, num_states, num_states); + Map,Aligned> inv_evectors(inv_eigenvectors, num_states, num_states); + MatrixXd res = evectors * eval_exp.matrix().asDiagonal() * inv_evectors; + Map >map_trans(trans_matrix,num_states,num_states); + map_trans = res; + + res = evectors * eval_exp_derv1.matrix().asDiagonal() * inv_evectors; + Map >map_derv1(trans_derv1,num_states,num_states); + map_derv1 = res; + + res = evectors * eval_exp_derv2.matrix().asDiagonal() * inv_evectors; + Map >map_derv2(trans_derv2,num_states,num_states); + map_derv2 = res; + } + /* + //Flat version double exptime[num_states]; for (i = 0; i < num_states; i++) @@ -1081,6 +1258,8 @@ void ModelMarkov::decomposeRateMatrixNonrev() { if (phylo_tree->params->matrix_exp_technique == MET_EIGEN_DECOMPOSITION) { eigensystem_nonrev(rate_matrix, state_freq, eigenvalues, eigenvalues_imag, eigenvectors, inv_eigenvectors, num_states); + calculateSquareMatrixTranspose(inv_eigenvectors, num_states + , inv_eigenvectors_transposed); return; } @@ -1168,11 +1347,13 @@ void ModelMarkov::decomposeRateMatrixNonrev() { } } } - + calculateSquareMatrixTranspose(inv_eigenvectors, num_states + , inv_eigenvectors_transposed); + // sanity check -// MatrixXcd eval_diag = eval.asDiagonal(); -// MatrixXd check = (inv_evec * mat * evec - eval_diag).cwiseAbs(); -// ASSERT(check.maxCoeff() < 1e-4); + // MatrixXcd eval_diag = eval.asDiagonal(); + // MatrixXd check = (inv_evec * mat * evec - eval_diag).cwiseAbs(); + // ASSERT(check.maxCoeff() < 1e-4); } void ModelMarkov::decomposeRateMatrix(){ @@ -1210,9 +1391,14 @@ void ModelMarkov::decomposeRateMatrix(){ eigenvectors[i*num_states+i] = state_freq[0]/state_freq[i]; } - for (i = 0; i < num_states; i++) - for (j = 0; j < num_states; j++) - inv_eigenvectors[i*num_states+j] = state_freq[j]*eigenvectors[j*num_states+i]; + for (i = 0; i < num_states; i++) { + for (j = 0; j < num_states; j++) { + inv_eigenvectors[i*num_states+j] + = state_freq[j]*eigenvectors[j*num_states+i]; + } + } + calculateSquareMatrixTranspose(inv_eigenvectors, num_states + , inv_eigenvectors_transposed); writeInfo(cout); // sanity check double *q = new double[num_states*num_states]; @@ -1231,9 +1417,7 @@ void ModelMarkov::decomposeRateMatrix(){ delete [] q; return; } - auto technique = phylo_tree->params->matrix_exp_technique; - if (technique == MET_EIGEN3LIB_DECOMPOSITION) { // Use Eigen3 library for eigen decomposition of symmetric matrix int n = 0; // the number of states where freq is non-zero @@ -1371,40 +1555,42 @@ void ModelMarkov::decomposeRateMatrix(){ } } } + calculateSquareMatrixTranspose(inv_eigenvectors, num_states + , inv_eigenvectors_transposed); return; } decomposeRateMatrixRev(); } void ModelMarkov::decomposeRateMatrixRev() { - - int i, j, k; // general reversible model double **rate_matrix = new double*[num_states]; - for (i = 0; i < num_states; i++) + for (int i = 0; i < num_states; i++) { rate_matrix[i] = new double[num_states]; - + } if (half_matrix) { - for (i = 0, k = 0; i < num_states; i++) { + for (int i = 0, k = 0; i < num_states; i++) { rate_matrix[i][i] = 0.0; - for (j = i+1; j < num_states; j++, k++) { + for (int j = i+1; j < num_states; j++, k++) { rate_matrix[i][j] = (state_freq[i] <= ZERO_FREQ || state_freq[j] <= ZERO_FREQ) ? 0 : rates[k]; rate_matrix[j][i] = rate_matrix[i][j]; } } } else { // full matrix - for (i = 0; i < num_states; i++) { + for (int i = 0; i < num_states; i++) { memcpy(rate_matrix[i], &rates[i*num_states], num_states*sizeof(double)); rate_matrix[i][i] = 0.0; } } /* eigensystem of 1 PAM rate matrix */ eigensystem_sym(rate_matrix, state_freq, eigenvalues, eigenvectors, inv_eigenvectors, num_states); - //eigensystem(rate_matrix, state_freq, eigenvalues, eigenvectors, inv_eigenvectors, num_states); - for (i = num_states-1; i >= 0; i--) + calculateSquareMatrixTranspose(inv_eigenvectors, num_states + , inv_eigenvectors_transposed); + for (int i = num_states-1; i >= 0; i--) { delete [] rate_matrix[i]; + } delete [] rate_matrix; } @@ -1414,7 +1600,9 @@ void ModelMarkov::readRates(istream &in) throw(const char*, string) { in >> str; if (str == "equalrate") { for (int i = 0; i < nrates; i++) + { rates[i] = 1.0; + } } else if (is_reversible ){ // reversible model try { @@ -1624,27 +1812,18 @@ ModelMarkov::~ModelMarkov() { void ModelMarkov::freeMem() { - if (inv_eigenvectors) - aligned_free(inv_eigenvectors); - if (eigenvectors) - aligned_free(eigenvectors); - if (eigenvalues) - aligned_free(eigenvalues); - - if (rates) delete [] rates; - - if (cinv_evec) - aligned_free(cinv_evec); - if (cevec) - aligned_free(cevec); - if (ceval) - aligned_free(ceval); - if (eigenvalues_imag) - aligned_free(eigenvalues_imag); - if (rate_matrix) - aligned_free(rate_matrix); -// if (model_parameters) -// delete [] model_parameters; + aligned_free(inv_eigenvectors); + aligned_free(inv_eigenvectors_transposed); + aligned_free(eigenvectors); + aligned_free(eigenvalues); + + delete [] rates; + + aligned_free(cinv_evec); + aligned_free(cevec); + aligned_free(ceval); + aligned_free(eigenvalues_imag); + aligned_free(rate_matrix); } double *ModelMarkov::getEigenvalues() const @@ -1661,31 +1840,34 @@ double* ModelMarkov::getInverseEigenvectors() const { return inv_eigenvectors; } -//void ModelGTR::setEigenCoeff(double *eigenCoeff) -//{ -// eigen_coeff = eigenCoeff; -//} +double* ModelMarkov::getInverseEigenvectorsTransposed() const { + return inv_eigenvectors_transposed; +} + +void ModelMarkov::setEigenvalues(double *eigenValues) +{ + this->eigenvalues = eigenValues; +} -void ModelMarkov::setEigenvalues(double *eigenvalues) +void ModelMarkov::setEigenvectors(double *eigenVectors) { - this->eigenvalues = eigenvalues; + this->eigenvectors = eigenVectors; } -void ModelMarkov::setEigenvectors(double *eigenvectors) +void ModelMarkov::setInverseEigenvectors(double *eigenV) { - this->eigenvectors = eigenvectors; + this->inv_eigenvectors = eigenV; } -void ModelMarkov::setInverseEigenvectors(double *inv_eigenvectors) +void ModelMarkov::setInverseEigenvectorsTransposed(double *eigenVTranspose) { - this->inv_eigenvectors = inv_eigenvectors; + this->inv_eigenvectors_transposed = eigenVTranspose; } /****************************************************/ /* NON-REVERSIBLE STUFFS */ /****************************************************/ - void ModelMarkov::setRates() { // I don't know the proper C++ way to handle this: got error if I didn't define something here. ASSERT(0 && "setRates should only be called on subclass of ModelMarkov"); @@ -1711,11 +1893,13 @@ int ModelMarkov::get_num_states_total() { return num_states; } -void ModelMarkov::update_eigen_pointers(double *eval, double *evec, double *inv_evec) { - eigenvalues = eval; - eigenvectors = evec; - inv_eigenvectors = inv_evec; - return; +void ModelMarkov::update_eigen_pointers(double *eval, double *evec + , double *inv_evec, double *inv_evec_transposed) { + eigenvalues = eval; + eigenvectors = evec; + inv_eigenvectors = inv_evec; + inv_eigenvectors_transposed = inv_evec_transposed; + return; } void ModelMarkov::computeTransMatrixEigen(double time, double *trans_matrix) { @@ -1723,10 +1907,9 @@ void ModelMarkov::computeTransMatrixEigen(double time, double *trans_matrix) { double evol_time = time / total_num_subst; int nstates_2 = num_states*num_states; double *exptime = new double[nstates_2]; - int i, j, k; memset(exptime, 0, sizeof(double)*nstates_2); - for (i = 0; i < num_states; i++) + for (int i = 0; i < num_states; i++) { if (eigenvalues_imag[i] == 0.0) { exptime[i*num_states+i] = exp(evol_time * eigenvalues[i]); } else { @@ -1739,25 +1922,24 @@ void ModelMarkov::computeTransMatrixEigen(double time, double *trans_matrix) { exptime[i*num_states+i] = exp_eval.real(); exptime[i*num_states+i-1] = -exp_eval.imag(); } - - + } // compute V * exp(L t) - for (i = 0; i < num_states; i++) - for (j = 0; j < num_states; j++) { + for (int i = 0; i < num_states; i++) { + for (int j = 0; j < num_states; j++) { double val = 0; - for (k = 0; k < num_states; k++) + for (int k = 0; k < num_states; k++) val += eigenvectors[i*num_states+k] * exptime[k*num_states+j]; trans_matrix[i*num_states+j] = val; } - + } memcpy(exptime, trans_matrix, sizeof(double)*nstates_2); // then compute V * exp(L t) * V^{-1} - for (i = 0; i < num_states; i++) { + for (int i = 0; i < num_states; i++) { double row_sum = 0.0; - for (j = 0; j < num_states; j++) { + for (int j = 0; j < num_states; j++) { double val = 0; - for (k = 0; k < num_states; k++) + for (int k = 0; k < num_states; k++) val += exptime[i*num_states+k] * inv_eigenvectors[k*num_states+j]; // make sure that trans_matrix are non-negative ASSERT(val >= -0.001); @@ -1767,7 +1949,6 @@ void ModelMarkov::computeTransMatrixEigen(double time, double *trans_matrix) { } ASSERT(fabs(row_sum-1.0) < 1e-4); } - delete [] exptime; } diff --git a/model/modelmarkov.h b/model/modelmarkov.h index f27570d94..937105d65 100644 --- a/model/modelmarkov.h +++ b/model/modelmarkov.h @@ -357,6 +357,7 @@ class ModelMarkov : public ModelSubst, public EigenDecomposition virtual double *getEigenvectors() const; virtual double *getInverseEigenvectors() const; + virtual double *getInverseEigenvectorsTransposed() const; // void setEigenCoeff(double *eigenCoeff); @@ -366,6 +367,23 @@ class ModelMarkov : public ModelSubst, public EigenDecomposition void setInverseEigenvectors(double *inv_eigenvectors); + void setInverseEigenvectorsTransposed(double *inv_eigenvectors); + + static void calculateExponentOfScalarMultiply(const double* source, int size + , double scalar, double* dest); + + static void calculateHadamardProduct(const double* first, const double* second + , int size, double *dest); + + static double dotProduct(const double* first, const double* second, int size); + + static void calculateSquareMatrixTranspose(const double* original, int rank + , double* transpose); + + static void aTimesDiagonalBTimesTransposeOfC + ( const double* matrixA, const double* rowB + , const double* matrixCTranspose, int rank + , double* dest); /** * compute the memory size for the model, can be large for site-specific models * @return memory size required in bytes @@ -407,7 +425,8 @@ class ModelMarkov : public ModelSubst, public EigenDecomposition // need to be updated recursively, if the model is a mixture model. For a // normal Markov model, only the standard pointers are set. This was done in // `ModelMixture::initMem()` before. - virtual void update_eigen_pointers(double *eval, double *evec, double *inv_evec); + virtual void update_eigen_pointers(double *eval, double *evec + , double *inv_evec, double *inv_evec_transposed); /** @@ -487,11 +506,11 @@ class ModelMarkov : public ModelSubst, public EigenDecomposition */ double *inv_eigenvectors; - /** - coefficient cache, served for fast computation of the P(t) matrix - */ -// double *eigen_coeff; - + /** + transpose of the matrix of the inverse eigenvectors of the rate matrix Q + */ + double *inv_eigenvectors_transposed; + /** state with highest frequency, used when optimizing state frequencies +FO */ int highest_freq_state; diff --git a/model/modelmixture.cpp b/model/modelmixture.cpp index 9dcdf29fa..7aa10b567 100644 --- a/model/modelmixture.cpp +++ b/model/modelmixture.cpp @@ -1282,8 +1282,7 @@ void ModelMixture::initMixture(string orig_model_name, string model_name, string full_name += CLOSE_BRACKET; int nmixtures = size(); - if (prop) - aligned_free(prop); + aligned_free(prop); prop = aligned_alloc(nmixtures); double sum = 0.0; @@ -1345,78 +1344,76 @@ void ModelMixture::initMixture(string orig_model_name, string model_name, string delete nxs_freq_optimize; delete nxs_freq_empirical; - } void ModelMixture::initMem() { - - int nmixtures = size(); - - // Calculate the total number of states and take into account that each of the - // models may be a mixture model itself (PoMo rate heterogeneity). - - int num_states_total = 0; - for (iterator it = begin(); it != end(); it++) - num_states_total += (*it)->get_num_states_total(); - - if (eigenvalues) aligned_free(eigenvalues); - if (eigenvectors) aligned_free(eigenvectors); - if (inv_eigenvectors) aligned_free(inv_eigenvectors); - // if (eigen_coeff) aligned_free(eigen_coeff); - - eigenvalues = aligned_alloc(num_states_total*nmixtures); - eigenvectors = aligned_alloc(num_states_total*num_states_total*nmixtures); - inv_eigenvectors = aligned_alloc(num_states_total*num_states_total*nmixtures); - // int ncoeff = num_states_total*num_states_total*num_states_total; - // eigen_coeff = aligned_alloc(ncoeff*nmixtures); - - // assigning memory for individual models - int m = 0; - int count_num_states = 0; - int count_num_states_2 = 0; - for (iterator it = begin(); it != end(); it++, m++) { - int num_states_this_model = (*it)->get_num_states_total(); - int num_states_this_model_2 = num_states_this_model * num_states_this_model; - // first copy memory for eigen stuffs - memcpy(&eigenvalues[count_num_states], (*it)->eigenvalues, - num_states_this_model*sizeof(double)); - memcpy(&eigenvectors[count_num_states_2], (*it)->eigenvectors, - num_states_this_model_2*sizeof(double)); - memcpy(&inv_eigenvectors[count_num_states_2], (*it)->inv_eigenvectors, - num_states_this_model_2*sizeof(double)); - // memcpy(&eigen_coeff[m*ncoeff], (*it)->eigen_coeff, ncoeff*sizeof(double)); - - // then delete - if ((*it)->eigenvalues) aligned_free((*it)->eigenvalues); - if ((*it)->eigenvectors) aligned_free((*it)->eigenvectors); - if ((*it)->inv_eigenvectors) aligned_free((*it)->inv_eigenvectors); - // if ((*it)->eigen_coeff) aligned_free((*it)->eigen_coeff); - - // And assign new memory. Also, recursively, update respective pointers for - // the mixture components of the current model. This is relevant if the - // current model is a mixture model itself. - (*it)->update_eigen_pointers(&eigenvalues[count_num_states], - &eigenvectors[count_num_states_2], - &inv_eigenvectors[count_num_states_2]); - // (*it)->eigen_coeff = &eigen_coeff[m*ncoeff]; - - // Update the state counters, so that the pointers are assigned correctly - // for the next mixture component. - count_num_states += num_states_this_model; - count_num_states_2 += num_states_this_model_2; - } + + int nmixtures = size(); + + // Calculate the total number of states and take into account that each of the + // models may be a mixture model itself (PoMo rate heterogeneity). + + int num_states_total = 0; + for (iterator it = begin(); it != end(); it++) { + num_states_total += (*it)->get_num_states_total(); + } + + aligned_free(eigenvalues); + aligned_free(eigenvectors); + aligned_free(inv_eigenvectors); + aligned_free(inv_eigenvectors_transposed); + ensure_aligned_allocated(eigenvalues, num_states_total*nmixtures); + ensure_aligned_allocated(eigenvectors, num_states_total*num_states_total*nmixtures); + ensure_aligned_allocated(inv_eigenvectors, num_states_total*num_states_total*nmixtures); + ensure_aligned_allocated(inv_eigenvectors_transposed, num_states_total*num_states_total*nmixtures); + + // assigning memory for individual models + int m = 0; + int count_num_states = 0; + int count_num_states_2 = 0; + for (iterator it = begin(); it != end(); it++, m++) { + int num_states_this_model = (*it)->get_num_states_total(); + int num_states_this_model_2 = num_states_this_model * num_states_this_model; + // first copy memory for eigen stuffs + memcpy(&eigenvalues[count_num_states], (*it)->eigenvalues, + num_states_this_model*sizeof(double)); + memcpy(&eigenvectors[count_num_states_2], (*it)->eigenvectors, + num_states_this_model_2*sizeof(double)); + memcpy(&inv_eigenvectors[count_num_states_2], (*it)->inv_eigenvectors, + num_states_this_model_2*sizeof(double)); + memcpy(&inv_eigenvectors_transposed[count_num_states_2], (*it)->inv_eigenvectors_transposed, + num_states_this_model_2*sizeof(double)); + + // then delete + aligned_free((*it)->eigenvalues); + aligned_free((*it)->eigenvectors); + aligned_free((*it)->inv_eigenvectors); + aligned_free((*it)->inv_eigenvectors_transposed); + + // And assign new memory. Also, recursively, update respective pointers for + // the mixture components of the current model. This is relevant if the + // current model is a mixture model itself. + (*it)->update_eigen_pointers(&eigenvalues[count_num_states], + &eigenvectors[count_num_states_2], + &inv_eigenvectors[count_num_states_2], + &inv_eigenvectors_transposed[count_num_states_2]); + + // Update the state counters, so that the pointers are assigned correctly + // for the next mixture component. + count_num_states += num_states_this_model; + count_num_states_2 += num_states_this_model_2; + } } ModelMixture::~ModelMixture() { - if (prop) - aligned_free(prop); - for (reverse_iterator rit = rbegin(); rit != rend(); rit++) { -// (*rit)->eigen_coeff = NULL; - (*rit)->eigenvalues = NULL; - (*rit)->eigenvectors = NULL; - (*rit)->inv_eigenvectors = NULL; - delete (*rit); - } + aligned_free(prop); + for (reverse_iterator rit = rbegin(); rit != rend(); rit++) { + (*rit)->eigenvalues = nullptr; + (*rit)->eigenvectors = nullptr; + (*rit)->inv_eigenvectors = nullptr; + (*rit)->inv_eigenvectors_transposed = nullptr; + delete (*rit); + } } void ModelMixture::setCheckpoint(Checkpoint *checkpoint) { @@ -1652,7 +1649,7 @@ double ModelMixture::optimizeWithEM(double gradient_epsilon) { tree->central_scale_num = phylo_tree->central_scale_num; tree->central_partial_pars = phylo_tree->central_partial_pars; - tree->copyPhyloTree(phylo_tree); + tree->copyPhyloTree(phylo_tree, true); tree->optimize_by_newton = phylo_tree->optimize_by_newton; tree->setParams(phylo_tree->params); tree->setLikelihoodKernel(phylo_tree->sse); @@ -1752,7 +1749,7 @@ double ModelMixture::optimizeWithEM(double gradient_epsilon) { // now optimize model one by one for (c = 0; c < nmix; c++) if (at(c)->getNDim() > 0) { - tree->copyPhyloTreeMixlen(phylo_tree, c); + tree->copyPhyloTreeMixlen(phylo_tree, c, true); ModelMarkov *subst_model; subst_model = at(c); tree->setModel(subst_model); diff --git a/model/modelpomomixture.cpp b/model/modelpomomixture.cpp index 0618baf52..0286f5863 100644 --- a/model/modelpomomixture.cpp +++ b/model/modelpomomixture.cpp @@ -156,6 +156,8 @@ void ModelPoMoMixture::decomposeRateMatrix() { memcpy(eigenvalues+m*num_states, eigenvalues, sizeof(double)*num_states); memcpy(eigenvectors+m*num_states_2, eigenvectors, sizeof(double)*num_states_2); memcpy(inv_eigenvectors+m*num_states_2, inv_eigenvectors, sizeof(double)*num_states_2); + memcpy(inv_eigenvectors_transposed+m*num_states_2 + , inv_eigenvectors_transposed, sizeof(double)*num_states_2); } // restore mutation_rate matrix memcpy(mutation_rate_matrix, saved_mutation_rate_matrix, sizeof(double)*n_alleles*n_alleles); @@ -227,18 +229,26 @@ int ModelPoMoMixture::get_num_states_total() { return num_states * getNMixtures(); } -void ModelPoMoMixture::update_eigen_pointers(double *eval, double *evec, double *inv_evec) { - eigenvalues = eval; - eigenvectors = evec; - inv_eigenvectors = inv_evec; - // We assume that all mixture model components have the same number of states. - int m = 0; - for (iterator it = begin(); it != end(); it++, m++) { - (*it)->update_eigen_pointers(eval + m*num_states, - evec + m*num_states*num_states, - inv_evec + m*num_states*num_states); - } - return; +void ModelPoMoMixture::update_eigen_pointers(double *eval, double *evec + , double *inv_evec, double* inv_evec_transposed) { + eigenvalues = eval; + eigenvectors = evec; + inv_eigenvectors = inv_evec; + inv_eigenvectors_transposed = inv_evec_transposed; + + // We assume that all mixture model components have the same number of states. + size_t rowOffset = 0; + size_t matrixOffset = 0; //into matrices + size_t num_states_squared = num_states * num_states; + + for (iterator it = begin(); it != end(); + it++, rowOffset+=num_states, matrixOffset+=num_states_squared) { + (*it)->update_eigen_pointers(eval + rowOffset, + evec + matrixOffset, + inv_evec + matrixOffset, + inv_evec_transposed + matrixOffset); + } + return; } bool ModelPoMoMixture::isUnstableParameters() { diff --git a/model/modelpomomixture.h b/model/modelpomomixture.h index 51cf5754c..2ec5253fd 100644 --- a/model/modelpomomixture.h +++ b/model/modelpomomixture.h @@ -146,7 +146,8 @@ class ModelPoMoMixture : public ModelPoMo, public ModelMixture { // need to be updated recursively, if the model is a mixture model. For a // normal Markov model, only the standard pointers are set. This was done in // `ModelMixture::initMem()` before. - virtual void update_eigen_pointers(double *eval, double *evec, double *inv_evec); + virtual void update_eigen_pointers(double *eval, double *evec + , double *inv_evec, double *inv_evec_transposed); /** diff --git a/model/modelset.cpp b/model/modelset.cpp index 27e3d7915..8909a6c57 100644 --- a/model/modelset.cpp +++ b/model/modelset.cpp @@ -135,7 +135,6 @@ void ModelSet::decomposeRateMatrix() // rearrange eigen to obey vector_size size_t vsize = phylo_tree->vector_size; size_t states2 = num_states*num_states; - size_t ptn, i, x; size_t max_size = get_safe_upper_limit(size()); @@ -144,35 +143,37 @@ void ModelSet::decomposeRateMatrix() memcpy(&eigenvalues[m*num_states], &eigenvalues[(m-1)*num_states], sizeof(double)*num_states); memcpy(&eigenvectors[m*states2], &eigenvectors[(m-1)*states2], sizeof(double)*states2); memcpy(&inv_eigenvectors[m*states2], &inv_eigenvectors[(m-1)*states2], sizeof(double)*states2); + memcpy(&inv_eigenvectors_transposed[m*states2], &inv_eigenvectors_transposed[(m-1)*states2], sizeof(double)*states2); } double new_eval[num_states*vsize]; double new_evec[states2*vsize]; double new_inv_evec[states2*vsize]; - for (ptn = 0; ptn < size(); ptn += vsize) { - double *eval_ptr = &eigenvalues[ptn*num_states]; - double *evec_ptr = &eigenvectors[ptn*states2]; - double *inv_evec_ptr = &inv_eigenvectors[ptn*states2]; - for (i = 0; i < vsize; i++) { - for (x = 0; x < num_states; x++) - new_eval[x*vsize+i] = eval_ptr[x]; - for (x = 0; x < states2; x++) { - new_evec[x*vsize+i] = evec_ptr[x]; - new_inv_evec[x*vsize+i] = inv_evec_ptr[x]; - } - eval_ptr += num_states; - evec_ptr += states2; - inv_evec_ptr += states2; - } - // copy new values + for (size_t ptn = 0; ptn < size(); ptn += vsize) { + double *eval_ptr = &eigenvalues[ptn*num_states]; + double *evec_ptr = &eigenvectors[ptn*states2]; + double *inv_evec_ptr = &inv_eigenvectors[ptn*states2]; + for (size_t i = 0; i < vsize; i++) { + for (size_t x = 0; x < num_states; x++) + new_eval[x*vsize+i] = eval_ptr[x]; + for (size_t x = 0; x < states2; x++) { + new_evec[x*vsize+i] = evec_ptr[x]; + new_inv_evec[x*vsize+i] = inv_evec_ptr[x]; + } + eval_ptr += num_states; + evec_ptr += states2; + inv_evec_ptr += states2; + } + // copy new values memcpy(&eigenvalues[ptn*num_states], new_eval, sizeof(double)*num_states*vsize); memcpy(&eigenvectors[ptn*states2], new_evec, sizeof(double)*states2*vsize); memcpy(&inv_eigenvectors[ptn*states2], new_inv_evec, sizeof(double)*states2*vsize); - } + calculateSquareMatrixTranspose(new_inv_evec, num_states + , &inv_eigenvectors_transposed[ptn*states2]); + } } - bool ModelSet::getVariables(double* variables) { ASSERT(size()); @@ -188,52 +189,57 @@ void ModelSet::setVariables(double* variables) front()->setVariables(variables); } - ModelSet::~ModelSet() { - for (reverse_iterator rit = rbegin(); rit != rend(); rit++) { - (*rit)->eigenvalues = NULL; - (*rit)->eigenvectors = NULL; - (*rit)->inv_eigenvectors = NULL; - delete (*rit); - } + for (reverse_iterator rit = rbegin(); rit != rend(); rit++) { + (*rit)->eigenvalues = nullptr; + (*rit)->eigenvectors = nullptr; + (*rit)->inv_eigenvectors = nullptr; + (*rit)->inv_eigenvectors_transposed = nullptr; + delete (*rit); + } } void ModelSet::joinEigenMemory() { size_t nmixtures = get_safe_upper_limit(size()); - if (eigenvalues) aligned_free(eigenvalues); - if (eigenvectors) aligned_free(eigenvectors); - if (inv_eigenvectors) aligned_free(inv_eigenvectors); - + aligned_free(eigenvalues); + aligned_free(eigenvectors); + aligned_free(inv_eigenvectors); + aligned_free(inv_eigenvectors_transposed); + size_t states2 = num_states*num_states; - - eigenvalues = aligned_alloc(num_states*nmixtures); - eigenvectors = aligned_alloc(states2*nmixtures); - inv_eigenvectors = aligned_alloc(states2*nmixtures); - - // assigning memory for individual models - size_t m = 0; - for (iterator it = begin(); it != end(); it++, m++) { + + eigenvalues = aligned_alloc(num_states*nmixtures); + eigenvectors = aligned_alloc(states2*nmixtures); + inv_eigenvectors = aligned_alloc(states2*nmixtures); + inv_eigenvectors_transposed = aligned_alloc(states2*nmixtures); + + // assigning memory for individual models + size_t m = 0; + for (iterator it = begin(); it != end(); it++, m++) { // first copy memory for eigen stuffs memcpy(&eigenvalues[m*num_states], (*it)->eigenvalues, num_states*sizeof(double)); memcpy(&eigenvectors[m*states2], (*it)->eigenvectors, states2*sizeof(double)); memcpy(&inv_eigenvectors[m*states2], (*it)->inv_eigenvectors, states2*sizeof(double)); + memcpy(&inv_eigenvectors_transposed[m*states2], (*it)->inv_eigenvectors_transposed, states2*sizeof(double)); // then delete - if ((*it)->eigenvalues) aligned_free((*it)->eigenvalues); - if ((*it)->eigenvectors) aligned_free((*it)->eigenvectors); - if ((*it)->inv_eigenvectors) aligned_free((*it)->inv_eigenvectors); -// if ((*it)->eigen_coeff) aligned_free((*it)->eigen_coeff); - + aligned_free((*it)->eigenvalues); + aligned_free((*it)->eigenvectors); + aligned_free((*it)->inv_eigenvectors); + aligned_free((*it)->inv_eigenvectors_transposed); + // and assign new memory - (*it)->eigenvalues = &eigenvalues[m*num_states]; - (*it)->eigenvectors = &eigenvectors[m*states2]; - (*it)->inv_eigenvectors = &inv_eigenvectors[m*states2]; - } - + (*it)->eigenvalues = &eigenvalues[m*num_states]; + (*it)->eigenvectors = &eigenvectors[m*states2]; + (*it)->inv_eigenvectors = &inv_eigenvectors[m*states2]; + (*it)->inv_eigenvectors_transposed = &inv_eigenvectors_transposed[m*states2]; + } + // copy dummy values for (m = size(); m < nmixtures; m++) { memcpy(&eigenvalues[m*num_states], &eigenvalues[(m-1)*num_states], sizeof(double)*num_states); memcpy(&eigenvectors[m*states2], &eigenvectors[(m-1)*states2], sizeof(double)*states2); memcpy(&inv_eigenvectors[m*states2], &inv_eigenvectors[(m-1)*states2], sizeof(double)*states2); + memcpy(&inv_eigenvectors_transposed[m*states2], &inv_eigenvectors_transposed[(m-1)*states2], sizeof(double)*states2); } } diff --git a/model/modelsubst.h b/model/modelsubst.h index c4e84a329..f4c79eea9 100644 --- a/model/modelsubst.h +++ b/model/modelsubst.h @@ -336,9 +336,14 @@ class ModelSubst: public Optimization, public CheckpointFactory } virtual double *getInverseEigenvectors() const { - return NULL; + return nullptr; } + virtual double *getInverseEigenvectorsTransposed() const { + return nullptr; + } + + /** * compute the memory size for the model, can be large for site-specific models * @return memory size required in bytes diff --git a/model/ratefree.cpp b/model/ratefree.cpp index 67ae2b4a6..4b539fe1d 100644 --- a/model/ratefree.cpp +++ b/model/ratefree.cpp @@ -224,8 +224,6 @@ double RateFree::targetFunk(double x[]) { return -phylo_tree->computeLikelihood(); } - - /** optimize parameters. Default is to optimize gamma shape @return the best likelihood @@ -499,7 +497,7 @@ double RateFree::optimizeWithEM() { // tree->central_scale_num = phylo_tree->central_scale_num; // tree->central_partial_pars = phylo_tree->central_partial_pars; - tree->copyPhyloTree(phylo_tree); + tree->copyPhyloTree(phylo_tree, true); tree->optimize_by_newton = phylo_tree->optimize_by_newton; tree->setParams(phylo_tree->params); tree->setLikelihoodKernel(phylo_tree->sse); @@ -609,7 +607,7 @@ double RateFree::optimizeWithEM() { // now optimize rates one by one double sum = 0.0; for (c = 0; c < nmix; c++) { - tree->copyPhyloTree(phylo_tree); + tree->copyPhyloTree(phylo_tree, true); ModelMarkov *subst_model; if (phylo_tree->getModel()->isMixture() && phylo_tree->getModelFactory()->fused_mix_rate) subst_model = (ModelMarkov*)phylo_tree->getModel()->getMixtureClass(c); diff --git a/model/ratemeyerdiscrete.cpp b/model/ratemeyerdiscrete.cpp index 320aff350..d93841c27 100644 --- a/model/ratemeyerdiscrete.cpp +++ b/model/ratemeyerdiscrete.cpp @@ -209,7 +209,7 @@ RateMeyerDiscrete::RateMeyerDiscrete() { RateMeyerDiscrete::~RateMeyerDiscrete() { - if (rates) delete [] rates; + delete [] rates; } bool RateMeyerDiscrete::isSiteSpecificRate() { @@ -282,28 +282,55 @@ double RateMeyerDiscrete::computeFunction(double value) { double lh = 0.0; int nseq = phylo_tree->leafNum; int nstate = phylo_tree->getModel()->num_states; - int i, j, k, state1, state2; ModelSubst *model = phylo_tree->getModel(); int trans_size = nstate * nstate; double *trans_mat = new double[trans_size]; int *pair_freq = new int[trans_size]; - - for (i = 0; i < nseq-1; i++) - for (j = i+1; j < nseq; j++) { - memset(pair_freq, 0, trans_size * sizeof(int)); - for (k = 0; k < size(); k++) { - if (ptn_cat[k] != optimizing_cat) continue; - Pattern *pat = & phylo_tree->aln->at(k); - if ((state1 = pat->at(i)) < nstate && (state2 = pat->at(j)) < nstate) - pair_freq[state1*nstate + state2] += pat->frequency; - } - model->computeTransMatrix(value * dist_mat[i*nseq + j], trans_mat); - for (k = 0; k < trans_size; k++) if (pair_freq[k]) - lh -= pair_freq[k] * log(trans_mat[k]); - } - delete [] pair_freq; - delete [] trans_mat; - return lh; + + auto frequencies = phylo_tree->getConvertedSequenceFrequencies(); + for (size_t i = 0; i < nseq-1; i++) { + auto eyeSequence = phylo_tree->getConvertedSequenceByNumber(i); + for (size_t j = i + 1; j < nseq; j++) { + auto jaySequence = phylo_tree->getConvertedSequenceByNumber(j); + memset(pair_freq, 0, trans_size * sizeof(int)); + if (0 && jaySequence!=nullptr) { //DISABLED + for (size_t k = 0; k < size(); k++) { + if (ptn_cat[k] != optimizing_cat) { + continue; + } + int state1 = eyeSequence[k]; + auto pairRow = pair_freq + state1*nstate; + if (nstate<=state1) { + continue; + } + int state2 = jaySequence[k]; + if ( state2 < nstate) { + pairRow[state2] += frequencies[k]; + } + } + } else { + for (size_t k = 0; k < size(); k++) { + if (ptn_cat[k] != optimizing_cat) { + continue; + } + Pattern *pat = & phylo_tree->aln->at(k); + int state1 = pat->at(i); + int state2 = pat->at(j); + if ( state1 < nstate && state2 < nstate) { + pair_freq[state1*nstate + state2] += pat->frequency; + } + } + } + model->computeTransMatrix(value * dist_mat[i*nseq + j], trans_mat); + for (size_t k = 0; k < trans_size; k++) + { + lh -= pair_freq[k] * log(trans_mat[k]); + } + } + } +delete [] pair_freq; +delete [] trans_mat; +return lh; } void RateMeyerDiscrete::computeFuncDerv(double value, double &df, double &ddf) { @@ -323,18 +350,36 @@ void RateMeyerDiscrete::computeFuncDerv(double value, double &df, double &ddf) { df = ddf = 0.0; int *pair_freq = new int[trans_size]; - - for (size_t i = 0; i + 1 < nseq; ++i) + auto frequencies = phylo_tree->getConvertedSequenceFrequencies(); + for (size_t i = 0; i + 1 < nseq; ++i) { + auto eyeSequence = phylo_tree->getConvertedSequenceByNumber(i); for (size_t j = i+1; j < nseq; ++j) { + auto jaySequence = phylo_tree->getConvertedSequenceByNumber(j); memset(pair_freq, 0, trans_size * sizeof(int)); - for (size_t k = 0; k < size(); ++k) { - if (ptn_cat[k] != optimizing_cat) continue; - Pattern *pat = & phylo_tree->aln->at(k); - int state1 = pat->at(i); - int state2 = pat->at(j); - if (state1 < nstate && state2 < nstate) - pair_freq[state1*nstate + state2] += pat->frequency; - } + if (eyeSequence!=nullptr && jaySequence!=nullptr) { + for (size_t k = 0; k < size(); ++k) { + if (ptn_cat[k] != optimizing_cat) continue; + int state1 = eyeSequence[k]; + if (nstate<=state1) { + continue; + } + auto pairRow = pair_freq + state1*nstate; + int state2 = jaySequence[k]; + if (nstate<=state2) { + continue; + } + pairRow[state2] += frequencies[k]; + } + } else { + for (size_t k = 0; k < size(); ++k) { + if (ptn_cat[k] != optimizing_cat) continue; + Pattern *pat = & phylo_tree->aln->at(k); + int state1 = pat->at(i); + int state2 = pat->at(j); + if (state1 < nstate && state2 < nstate) + pair_freq[state1*nstate + state2] += pat->frequency; + } + } double dist = dist_mat[i*nseq + j]; double derv1 = 0.0, derv2 = 0.0; model->computeTransDerv(value * dist, trans_mat, trans_derv1, trans_derv2); @@ -352,6 +397,7 @@ void RateMeyerDiscrete::computeFuncDerv(double value, double &df, double &ddf) { df -= derv1 * dist; ddf -= derv2 * dist * dist; } + } delete [] pair_freq; delete [] trans_derv2; delete [] trans_derv1; @@ -414,25 +460,39 @@ double RateMeyerDiscrete::optimizeCatRate(int cat) { } void RateMeyerDiscrete::normalizeRates() { - double sum = 0.0, ok = 0.0; - int nptn = size(); - int i; - - for (i = 0; i < nptn; i++) { - //at(i) = rates[ptn_cat[i]]; - if (getPtnRate(i) < MAX_SITE_RATE) { - sum += getPtnRate(i) * phylo_tree->aln->at(i).frequency; - ok += phylo_tree->aln->at(i).frequency; - } - } - - if (fabs(sum - ok) > 1e-3) { - //cout << "Normalizing rates " << sum << " / " << ok << endl; - double scale_f = ok / sum; - for (i = 0; i < ncategory; i++) - if (rates[i] > 2*MIN_SITE_RATE && rates[i] < MAX_SITE_RATE) - rates[i] *= scale_f; - } + double sum = 0.0; + double ok = 0.0; + size_t nptn = size(); + + auto frequencies = phylo_tree->getConvertedSequenceFrequencies(); + if (frequencies!=nullptr) { + for (size_t i = 0; i < nptn; i++) { + //at(i) = rates[ptn_cat[i]]; + if (getPtnRate(i) < MAX_SITE_RATE) { + double freq = frequencies[i]; + sum += getPtnRate(i) * freq; + ok += freq; + } + } + } else { + for (size_t i = 0; i < nptn; i++) { + //at(i) = rates[ptn_cat[i]]; + if (getPtnRate(i) < MAX_SITE_RATE) { + double freq = phylo_tree->aln->at(i).frequency; + sum += getPtnRate(i) * freq; + ok += freq; + } + } + } + if (fabs(sum - ok) > 1e-3) { + //cout << "Normalizing rates " << sum << " / " << ok << endl; + double scale_f = ok / sum; + for (int i = 0; i < ncategory; i++) { + if (rates[i] > 2*MIN_SITE_RATE && rates[i] < MAX_SITE_RATE) { + rates[i] *= scale_f; + } + } + } } double RateMeyerDiscrete::classifyRatesKMeans() { diff --git a/tree/iqtree.cpp b/tree/iqtree.cpp index 70945d8e1..2e9271ae5 100644 --- a/tree/iqtree.cpp +++ b/tree/iqtree.cpp @@ -2042,6 +2042,7 @@ double IQTree::perturb(int times) { extern pllUFBootData * pllUFBootDataPtr; string IQTree::optimizeModelParameters(bool printInfo, double logl_epsilon) { + prepareToComputeDistances(); if (logl_epsilon == -1) logl_epsilon = params->modelEps; cout << "Estimate model parameters (epsilon = " << logl_epsilon << ")" << endl; @@ -2074,7 +2075,6 @@ string IQTree::optimizeModelParameters(bool printInfo, double logl_epsilon) { } else { modOptScore = getModelFactory()->optimizeParameters(params->fixed_branch_length, printInfo, logl_epsilon); } - if (isSuperTree()) { ((PhyloSuperTree*) this)->computeBranchLengths(); } @@ -2083,7 +2083,6 @@ string IQTree::optimizeModelParameters(bool printInfo, double logl_epsilon) { outWarning("Estimated model parameters are at boundary that can cause numerical instability!"); cout << endl; } - if (modOptScore < curScore - 1.0 && params->partition_type != TOPO_UNLINKED) { cout << " BUG: Tree logl gets worse after model optimization!" << endl; cout << " Old logl: " << curScore << " / " << "new logl: " << modOptScore << endl; @@ -2096,7 +2095,7 @@ string IQTree::optimizeModelParameters(bool printInfo, double logl_epsilon) { if (params->print_trees_site_posterior) computePatternCategories(); } - + doneComputingDistances(); //DISABLED return newTree; } @@ -2912,7 +2911,9 @@ pair IQTree::doNNISearch(bool write_info) { PLL_TRUE, 0, 0, 0, PLL_SUMMARIZE_LH, 0, 0); readTreeString(string(pllInst->tree_string)); } else { + prepareToComputeDistances(); nniInfos = optimizeNNI(Params::getInstance().speednni); + doneComputingDistances(); if (isSuperTree()) { ((PhyloSuperTree*) this)->computeBranchLengths(); } diff --git a/tree/mtree.cpp b/tree/mtree.cpp index 05d3f391d..fe396b294 100644 --- a/tree/mtree.cpp +++ b/tree/mtree.cpp @@ -1278,18 +1278,33 @@ void MTree::getOrderedTaxa(NodeVector &taxa, Node *node, Node *dad) { } void MTree::getTaxaName(vector &taxname, Node *node, Node *dad) { - if (!node) node = root; + if (!node) { + node = root; + } if (node->isLeaf()) { - if (taxname.empty()) taxname.resize(leafNum); + if (taxname.empty()) { + taxname.resize(leafNum); + } taxname[node->id] = node->name; } - //for (NeighborVec::iterator it = node->neighbors.begin(); it != node->neighbors.end(); it++) - //if ((*it)->node != dad) { FOR_NEIGHBOR_IT(node, dad, it) { getTaxaName(taxname, (*it)->node, node); } } +void MTree::getMapOfTaxonNameToNode(Node* node, Node* dad + , map &map) { + if (!node) { + node = root; + } + if (node->isLeaf()) { + map[node->name] = node; + } + FOR_NEIGHBOR_IT(node, dad, it) { + getMapOfTaxonNameToNode((*it)->node, node, map); + } +} + void MTree::getNodeName(set &nodename, Node *node, Node *dad) { if (!node) node = root; if (!node->name.empty()) diff --git a/tree/mtree.h b/tree/mtree.h index cd0733861..2345de02e 100644 --- a/tree/mtree.h +++ b/tree/mtree.h @@ -440,6 +440,17 @@ class MTree { */ void getTaxaName(vector &taxname, Node *node = NULL, Node *dad = NULL); + /** + get the descending taxa names below the node, and map each + name to the corresponding leaf node + @param node the starting node, nullptr to start from the root + @param dad dad of the node, used to direct the search + @param[out] map, from taxon name to corresponding node + */ + + void getMapOfTaxonNameToNode(Node* node, Node* dad + , map &map); + /** get the descending node names below the node @param node the starting node, NULL to start from the root diff --git a/tree/phylokernelnew.h b/tree/phylokernelnew.h index cd34a14bb..82bbf1e9d 100644 --- a/tree/phylokernelnew.h +++ b/tree/phylokernelnew.h @@ -9,6 +9,8 @@ #if !defined(PHYLOKERNELNEW_H_) || !defined(PHYLOKERNELNEW_STATE_H_) +#define JAMES_VERSION 1 + #ifdef KERNEL_FIX_STATES # define PHYLOKERNELNEW_STATE_H_ #else @@ -21,8 +23,6 @@ #include #endif -//#include - using namespace std; /******************************************************* @@ -86,6 +86,45 @@ template inline void dotProductVec(Numeric *A, VectorClass *B, VectorClass &X, size_t N) #endif { +#if JAMES_VERSION + if (N==4) { + X = (A[0]*B[0] + A[1]*B[1]) + (A[2]*B[2] + A[3]*B[3]); + return; + } + size_t cruft = (N & 3); + if (N==cruft) { + X = 0; + //IF cruft==N, N must be 0, 1, 2, or 3 + for (int i=0; i inline void productVecMat(VectorClass *A, Numeric *M, VectorClass *X, size_t N) #endif { + if (N==4) { + + } // quick treatment for small N <= 4 size_t i; switch (N) { @@ -307,7 +350,6 @@ inline void productVecMat(VectorClass *A, Numeric *M, VectorClass *X, size_t N) M += 3; } return; - case 4: for (i = 0; i < 4; i++) { // manual unrolling X[i] = mul_add(A[1],M[1],A[0]*M[0]) + mul_add(A[3],M[3],A[2]*M[2]); @@ -542,7 +584,27 @@ inline void dotProductPairAdd(Numeric *A, Numeric *B, VectorClass *D, Y = mul_add(B[0], D[0], Y); return; } - +#if JAMES_VERSION + VectorClass AD[2], BD[2]; + for (size_t j = 0; j < 2; j++) { + AD[j] = A[j] * D[j]; + BD[j] = B[j] * D[j]; + } + size_t iStop = N - (N&1); + for (size_t i = 2; i < iStop; i+=2) { + for (size_t j = 0; j < 2; j++) { + AD[j] = mul_add(A[i+j], D[i+j], AD[j]); + BD[j] = mul_add(B[i+j], D[i+j], BD[j]); + } + } + if ((N & 1) == 0) { + X += AD[0] + AD[1]; + Y += BD[0] + BD[1]; + } else { + X += mul_add(A[N-1], D[N-1], AD[0] + AD[1]); + Y += mul_add(B[N-1], D[N-1], BD[0] + BD[1]); + } +#else size_t i, j; if (N % 2 == 0) { VectorClass AD[2], BD[2]; @@ -550,12 +612,12 @@ inline void dotProductPairAdd(Numeric *A, Numeric *B, VectorClass *D, AD[j] = A[j] * D[j]; BD[j] = B[j] * D[j]; } - for (i = 2; i < N; i+=2) { + for (i = 2; i < N; i+=2) { for (j = 0; j < 2; j++) { AD[j] = mul_add(A[i+j], D[i+j], AD[j]); BD[j] = mul_add(B[i+j], D[i+j], BD[j]); } - } + } X += AD[0] + AD[1]; Y += BD[0] + BD[1]; } else { @@ -565,15 +627,16 @@ inline void dotProductPairAdd(Numeric *A, Numeric *B, VectorClass *D, AD[j] = A[j] * D[j]; BD[j] = B[j] * D[j]; } - for (i = 2; i < N-1; i+=2) { + for (i = 2; i < N-1; i+=2) { for (j = 0; j < 2; j++) { AD[j] = mul_add(A[i+j], D[i+j], AD[j]); BD[j] = mul_add(B[i+j], D[i+j], BD[j]); } - } + } X += mul_add(A[N-1], D[N-1], AD[0] + AD[1]); Y += mul_add(B[N-1], D[N-1], BD[0] + BD[1]); } +#endif } /** @@ -602,7 +665,7 @@ inline void dotProductTriple(Numeric *A, Numeric *B, Numeric *C, VectorClass *D, #endif { size_t i, j; - if (nstates % 2 == 0) { + if ((nstates & 1) == 0) { VectorClass AD[2], BD[2], CD[2]; for (j = 0; j < 2; j++) { AD[j] = A[j] * D[j]; @@ -860,7 +923,7 @@ void PhyloTree::computePartialInfo(TraversalInfo &info, VectorClass* buffer) { PhyloNeighbor *child = (PhyloNeighbor*)*it; // precompute information buffer if (child->direction == TOWARD_ROOT) { - // tranpose probability matrix + // transpose probability matrix double mat[nstatesqr]; for (c = 0; c < ncat_mix; c++) { double len_child = site_rate->getRate(c%ncat) * child->length; @@ -1341,8 +1404,9 @@ void PhyloTree::computePartialLikelihoodGenericSIMD(TraversalInfo &info, size_t double *vec_tip = (double*)&partial_lh_all[block]; for (ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size()) { - for (i = 0; i < block; i++) + for (i = 0; i < block; i++){ partial_lh_all[i] = 1.0; + } UBYTE *scale_dad = NULL; if (SAFE_NUMERIC) { scale_dad = dad_branch->scale_num + ptn*ncat_mix; @@ -1417,20 +1481,28 @@ void PhyloTree::computePartialLikelihoodGenericSIMD(TraversalInfo &info, size_t } else { // non site specific model PhyloNeighbor *child = (PhyloNeighbor*)*it; + auto stateRow = this->getConvertedSequenceByNumber(child->node->id); + auto unknown = aln->STATE_UNKNOWN; UBYTE *scale_child = SAFE_NUMERIC ? child->scale_num + ptn*ncat_mix : NULL; if (child->node->isLeaf()) { // external node // load data for tip for (i = 0; i < VectorClass::size(); i++) { - double *child_lh; - if (ptn+i < orig_nptn) - child_lh = partial_lh_leaf + block*(aln->at(ptn+i))[child->node->id]; - else if (ptn+i < max_orig_nptn) - child_lh = partial_lh_leaf + block*aln->STATE_UNKNOWN; - else if (ptn+i < nptn) - child_lh = partial_lh_leaf + block*model_factory->unobserved_ptns[ptn+i-max_orig_nptn][child->node->id]; - else - child_lh = partial_lh_leaf + block*aln->STATE_UNKNOWN; + int state; + if (ptn+i < orig_nptn) { + if (stateRow!=nullptr) { + state = stateRow[ptn+i]; + } else { + state = (aln->at(ptn+i))[child->node->id]; + } + } else if (ptn+i < max_orig_nptn) { + state = unknown; + } else if (ptn+i < nptn) { + state = model_factory->unobserved_ptns[ptn+i-max_orig_nptn][child->node->id]; + } else { + state = unknown; + } + double *child_lh = partial_lh_leaf + block*state; double *this_vec_tip = vec_tip+i; for (c = 0; c < block; c++) { *this_vec_tip = child_lh[c]; @@ -1562,6 +1634,10 @@ void PhyloTree::computePartialLikelihoodGenericSIMD(TraversalInfo &info, size_t double *vec_right = SITE_MODEL ? &vec_left[nstates*VectorClass::size()] : &vec_left[block*VectorClass::size()]; VectorClass *partial_lh_tmp = SITE_MODEL ? (VectorClass*)vec_right+nstates : (VectorClass*)vec_right+block; + auto leftStateRow = this->getConvertedSequenceByNumber(left->node->id); + auto rightStateRow = this->getConvertedSequenceByNumber(right->node->id); + auto unknown = aln->STATE_UNKNOWN; + for (ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size()) { VectorClass *partial_lh = (VectorClass*)(dad_branch->partial_lh + ptn*block); @@ -1597,26 +1673,37 @@ void PhyloTree::computePartialLikelihoodGenericSIMD(TraversalInfo &info, size_t partial_lh += nstates; } // FOR category } else { - VectorClass *vleft = (VectorClass*)vec_left; + VectorClass *vleft = (VectorClass*)vec_left; VectorClass *vright = (VectorClass*)vec_right; // load data for tip for (x = 0; x < VectorClass::size(); x++) { - double *tip_left, *tip_right; + int leftState; + int rightState; if (ptn+x < orig_nptn) { - tip_left = partial_lh_left + block * (aln->at(ptn+x))[left->node->id]; - tip_right = partial_lh_right + block * (aln->at(ptn+x))[right->node->id]; + if (leftStateRow!=nullptr) { + leftState = leftStateRow[ptn+x]; + } else { + leftState = (aln->at(ptn+x))[left->node->id]; + } + if (rightStateRow!=nullptr) { + rightState = rightStateRow[ptn+x]; + } else { + rightState = (aln->at(ptn+x))[right->node->id]; + } } else if (ptn+x < max_orig_nptn) { - tip_left = partial_lh_left + block * aln->STATE_UNKNOWN; - tip_right = partial_lh_right + block * aln->STATE_UNKNOWN; + leftState = unknown; + rightState = unknown; } else if (ptn+x < nptn) { - tip_left = partial_lh_left + block * model_factory->unobserved_ptns[ptn+x-max_orig_nptn][left->node->id]; - tip_right = partial_lh_right + block * model_factory->unobserved_ptns[ptn+x-max_orig_nptn][right->node->id]; + leftState = model_factory->unobserved_ptns[ptn+x-max_orig_nptn][left->node->id]; + rightState = model_factory->unobserved_ptns[ptn+x-max_orig_nptn][right->node->id]; } else { - tip_left = partial_lh_left + block * aln->STATE_UNKNOWN; - tip_right = partial_lh_right + block * aln->STATE_UNKNOWN; + leftState = unknown; + rightState = unknown; } - double *this_vec_left = vec_left+x; - double *this_vec_right = vec_right+x; + double* tip_left = partial_lh_left + block*leftState; + double* tip_right = partial_lh_right + block*rightState; + double* this_vec_left = vec_left+x; + double* this_vec_right = vec_right+x; for (i = 0; i < block; i++) { *this_vec_left = tip_left[i]; *this_vec_right = tip_right[i]; @@ -1665,10 +1752,12 @@ void PhyloTree::computePartialLikelihoodGenericSIMD(TraversalInfo &info, size_t double *vec_left = buffer_partial_lh_ptr + thread_buf_size*thread_id; VectorClass *partial_lh_tmp = SITE_MODEL ? (VectorClass*)vec_left+2*nstates : (VectorClass*)vec_left+block; + auto leftStateRow = this->getConvertedSequenceByNumber(left->node->id); + auto unknown = aln->STATE_UNKNOWN; + for (ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size()) { VectorClass *partial_lh = (VectorClass*)(dad_branch->partial_lh + ptn*block); VectorClass *partial_lh_right = (VectorClass*)(right->partial_lh + ptn*block); -// memset(partial_lh, 0, sizeof(VectorClass)*block); VectorClass lh_max = 0.0; if (SITE_MODEL) { @@ -1721,16 +1810,21 @@ void PhyloTree::computePartialLikelihoodGenericSIMD(TraversalInfo &info, size_t VectorClass *vleft = (VectorClass*)vec_left; // load data for tip for (x = 0; x < VectorClass::size(); x++) { - double *tip; + int state; if (ptn+x < orig_nptn) { - tip = partial_lh_left + block*(aln->at(ptn+x))[left->node->id]; + if (leftStateRow!=nullptr) { + state = leftStateRow[ptn+x]; + } else { + state = (aln->at(ptn+x))[left->node->id]; + } } else if (ptn+x < max_orig_nptn) { - tip = partial_lh_left + block*aln->STATE_UNKNOWN; + state = unknown; } else if (ptn+x < nptn) { - tip = partial_lh_left + block*model_factory->unobserved_ptns[ptn+x-max_orig_nptn][left->node->id]; + state = model_factory->unobserved_ptns[ptn+x-max_orig_nptn][left->node->id]; } else { - tip = partial_lh_left + block*aln->STATE_UNKNOWN; + state = unknown; } + double *tip = partial_lh_left + block*state; double *this_vec_left = vec_left+x; for (i = 0; i < block; i++) { *this_vec_left = tip[i]; @@ -1928,8 +2022,7 @@ void PhyloTree::computePartialLikelihoodGenericSIMD(TraversalInfo &info, size_t } if (Params::getInstance().buffer_mem_save) { - if (partial_lh_leaves) - aligned_free(partial_lh_leaves); + aligned_free(partial_lh_leaves); aligned_free(echildren); info.echildren = info.partial_lh_leaves = NULL; } @@ -1986,31 +2079,39 @@ void PhyloTree::computeLikelihoodBufferGenericSIMD(PhyloNeighbor *dad_branch, Ph if (dad->isLeaf()) { // special treatment for TIP-INTERNAL NODE case - double *tip_partial_lh_node = &tip_partial_lh[dad->id * max_orig_nptn*nstates]; - - double *vec_tip = buffer_partial_lh_ptr + tip_block*VectorClass::size()*thread_id; - - for (ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size()) { - VectorClass *partial_lh_dad = (VectorClass*)(dad_branch->partial_lh + ptn*block); - VectorClass *theta = (VectorClass*)(theta_all + ptn*block); + double *tip_partial_lh_node = &tip_partial_lh[dad->id * max_orig_nptn * nstates]; + double *vec_tip = buffer_partial_lh_ptr + tip_block * VectorClass::size() * thread_id; + auto stateRow = this->getConvertedSequenceByNumber(dad->id); + auto unknown = aln->STATE_UNKNOWN; + + size_t offset = ptn_lower*block; + size_t offsetStep = block*VectorClass::size(); + for (ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size(), offset+=offsetStep) { + VectorClass *partial_lh_dad = (VectorClass*)(dad_branch->partial_lh + offset); + VectorClass *theta = (VectorClass*)(theta_all + offset); //load tip vector if (!SITE_MODEL) for (i = 0; i < VectorClass::size(); i++) { - double *this_tip_partial_lh; - if (ptn+i < orig_nptn) - this_tip_partial_lh = tip_partial_lh + tip_block*(aln->at(ptn+i))[dad->id]; - else if (ptn+i < max_orig_nptn) - this_tip_partial_lh = tip_partial_lh + tip_block*aln->STATE_UNKNOWN; - else if (ptn+i < nptn) - this_tip_partial_lh = tip_partial_lh + tip_block*model_factory->unobserved_ptns[ptn+i-max_orig_nptn][dad->id]; - else - this_tip_partial_lh = tip_partial_lh + tip_block*aln->STATE_UNKNOWN; + int state; + if (ptn+i < orig_nptn) { + if (stateRow!=nullptr) { + state = stateRow[ptn+i]; + } else { + state = (aln->at(ptn+i))[dad->id]; + } + } else if (ptn+i < max_orig_nptn) { + state = unknown; + } else if (ptn+i < nptn) { + state = model_factory->unobserved_ptns[ptn+i-max_orig_nptn][dad->id]; + } else { + state = unknown; + } + double *this_tip_partial_lh = tip_partial_lh + tip_block*state; double *this_vec_tip = vec_tip+i; for (c = 0; c < tip_block; c++) { *this_vec_tip = this_tip_partial_lh[c]; this_vec_tip += VectorClass::size(); } - } VectorClass *lh_tip; if (SITE_MODEL) @@ -2595,8 +2696,6 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, size_t block = ncat_mix * nstates; size_t tip_block = nstates * model->getNMixtures(); - size_t ptn; // for big data size > 4GB memory required - size_t c, i; size_t orig_nptn = aln->size(); size_t max_orig_nptn = ((orig_nptn+VectorClass::size()-1)/VectorClass::size())*VectorClass::size(); size_t nptn = max_orig_nptn+model_factory->unobserved_ptns.size(); @@ -2611,15 +2710,13 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, double *eval = model->getEigenvalues(); ASSERT(eval); -// double *val = aligned_alloc(block); - double *val = NULL; + double *val = nullptr; double *buffer_partial_lh_ptr = buffer_partial_lh; - double cat_length[ncat]; double cat_prop[ncat]; if (SITE_MODEL) { - for (c = 0; c < ncat; c++) { + for (size_t c = 0; c < ncat; c++) { cat_length[c] = site_rate->getRate(c) * dad_branch->length; cat_prop[c] = site_rate->getProp(c); } @@ -2628,7 +2725,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, buffer_partial_lh_ptr += get_safe_upper_limit(block); if (nstates % VectorClass::size() == 0) { size_t loop_size = nstates / VectorClass::size(); - for (c = 0; c < ncat_mix; c++) { + for (size_t c = 0; c < ncat_mix; c++) { size_t mycat = c%ncat; size_t m = c/denom; mix_addr_nstates[c] = m*nstates; @@ -2637,11 +2734,12 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, double len = site_rate->getRate(mycat)*dad_branch->getLength(mycat); double prop = site_rate->getProp(mycat) * model->getMixtureWeight(m); VectorClass *this_val = (VectorClass*)(val + c*nstates); - for (i = 0; i < loop_size; i++) + for (size_t i = 0; i < loop_size; i++) { this_val[i] = exp(eval_ptr[i]*len) * prop; + } } } else { - for (c = 0; c < ncat_mix; c++) { + for (size_t c = 0; c < ncat_mix; c++) { size_t mycat = c%ncat; size_t m = c/denom; mix_addr_nstates[c] = m*nstates; @@ -2650,7 +2748,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, double len = site_rate->getRate(mycat)*dad_branch->getLength(mycat); double prop = site_rate->getProp(mycat) * model->getMixtureWeight(m); double *this_val = val + c*nstates; - for (i = 0; i < nstates; i++) + for (size_t i = 0; i < nstates; i++) this_val[i] = exp(eval_ptr[i]*len) * prop; } } @@ -2683,9 +2781,9 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, double *lh_node = partial_lh_node + state*block; double *lh_tip = tip_partial_lh + state*tip_block; double *vc_val_tmp = val; - for (c = 0; c < ncat_mix; c++) { + for (size_t c = 0; c < ncat_mix; c++) { double *this_lh_tip = lh_tip + mix_addr_nstates[c]; - for (i = 0; i < nstates; i+=VectorClass::size()) { + for (size_t i = 0; i < nstates; i+=VectorClass::size()) { (VectorClass().load_a(&vc_val_tmp[i]) * VectorClass().load_a(&this_lh_tip[i])).store_a(&lh_node[i]); } lh_node += nstates; @@ -2698,9 +2796,9 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, double *lh_node = partial_lh_node +state*block; double *val_tmp = val; double *this_tip_partial_lh = tip_partial_lh + state*tip_block; - for (c = 0; c < ncat_mix; c++) { + for (size_t c = 0; c < ncat_mix; c++) { double *lh_tip = this_tip_partial_lh + mix_addr_nstates[c]; - for (i = 0; i < nstates; i++) { + for (size_t i = 0; i < nstates; i++) { lh_node[i] = val_tmp[i] * lh_tip[i]; } lh_node += nstates; @@ -2709,10 +2807,12 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, } } } - + + auto stateRow = this->getConvertedSequenceByNumber(dad->id); + auto unknown = aln->STATE_UNKNOWN; // now do the real computation #ifdef _OPENMP -#pragma omp parallel for private(ptn, i, c) schedule(static, 1) num_threads(num_threads) +#pragma omp parallel for schedule(static, 1) num_threads(num_threads) #endif for (int thread_id = 0; thread_id < num_threads; thread_id++) { @@ -2730,7 +2830,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, double *vec_tip = buffer_partial_lh_ptr + block*VectorClass::size()*thread_id; - for (ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size()) { + for (size_t ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size()) { VectorClass lh_ptn(0.0); // lh_ptn.load_a(&ptn_invar[ptn]); VectorClass *lh_cat = (VectorClass*)(_pattern_lh_cat + ptn*ncat_mix); @@ -2740,7 +2840,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, if (SITE_MODEL) { // site-specific model VectorClass* eval_ptr = (VectorClass*) &eval[ptn*nstates]; - for (c = 0; c < ncat; c++) { + for (size_t c = 0; c < ncat; c++) { #ifdef KERNEL_FIX_STATES dotProductExp(eval_ptr, lh_node, partial_lh_dad, cat_length[c], lh_cat[c]); #else @@ -2755,26 +2855,31 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, } } else { // normal model //load tip vector - for (i = 0; i < VectorClass::size(); i++) { - double *lh_tip; - if (ptn+i < orig_nptn) - lh_tip = partial_lh_node + block*(aln->at(ptn+i))[dad->id]; - else if (ptn+i < max_orig_nptn) - lh_tip = partial_lh_node + block*aln->STATE_UNKNOWN; + for (size_t i = 0; i < VectorClass::size(); i++) { + int state; + if (ptn+i < orig_nptn) { + if (stateRow!=nullptr) { + state = stateRow[ptn+i]; + } else { + state = (aln->at(ptn+i))[dad->id]; + } + } else if (ptn+i < max_orig_nptn) + state = unknown; else if (ptn+i < nptn) - lh_tip = partial_lh_node + block*model_factory->unobserved_ptns[ptn+i-max_orig_nptn][dad->id]; + state = model_factory->unobserved_ptns[ptn+i-max_orig_nptn][dad->id]; else - lh_tip = partial_lh_node + block*aln->STATE_UNKNOWN; + state = aln->STATE_UNKNOWN; + double *lh_tip = partial_lh_node + block*state; double *this_vec_tip = vec_tip+i; - for (c = 0; c < block; c++) { + for (size_t c = 0; c < block; c++) { *this_vec_tip = lh_tip[c]; this_vec_tip += VectorClass::size(); } } // compute likelihood per category - for (c = 0; c < ncat_mix; c++) { + for (size_t c = 0; c < ncat_mix; c++) { #ifdef KERNEL_FIX_STATES dotProductVec(lh_node, partial_lh_dad, lh_cat[c]); #else @@ -2794,16 +2899,16 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, // numerical scaling per category UBYTE *scale_dad = dad_branch->scale_num + ptn*ncat_mix; UBYTE min_scale; - for (i = 0; i < VectorClass::size(); i++) { + for (size_t i = 0; i < VectorClass::size(); i++) { // scale_dad = dad_branch->scale_num+(ptn+i)*ncat_mix; min_scale = scale_dad[0]; - for (c = 1; c < ncat_mix; c++) + for (size_t c = 1; c < ncat_mix; c++) min_scale = min(min_scale, scale_dad[c]); vc_min_scale_ptr[i] = min_scale; double *this_lh_cat = &_pattern_lh_cat[ptn*ncat_mix + i]; - for (c = 0; c < ncat_mix; c++) { + for (size_t c = 0; c < ncat_mix; c++) { // rescale lh_cat if neccessary if (scale_dad[c] == min_scale+1) { this_lh_cat[c*VectorClass::size()] *= SCALING_THRESHOLD; @@ -2817,7 +2922,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, sumVec(lh_cat, lh_ptn, ncat_mix); } else { - for (i = 0; i < VectorClass::size(); i++) { + for (size_t i = 0; i < VectorClass::size(); i++) { vc_min_scale_ptr[i] = dad_branch->scale_num[ptn+i]; } } @@ -2839,7 +2944,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, if (horizontal_or(vc_min_scale != 0.0)) { // some entries are rescaled double *lh_ptn_dbl = (double*)&lh_ptn; - for (i = 0; i < VectorClass::size(); i++) + for (size_t i = 0; i < VectorClass::size(); i++) if (vc_min_scale_ptr[i] != 0.0) lh_ptn_dbl[i] *= SCALING_THRESHOLD; } @@ -2865,7 +2970,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, //-------- both dad and node are internal nodes -----------/ #ifdef _OPENMP -#pragma omp parallel for private(ptn, i, c) schedule(static, 1) num_threads(num_threads) +#pragma omp parallel for schedule(static, 1) num_threads(num_threads) #endif for (int thread_id = 0; thread_id < num_threads; thread_id++) { @@ -2881,7 +2986,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, for (vector::iterator it = traversal_info.begin(); it != traversal_info.end(); it++) computePartialLikelihood(*it, ptn_lower, ptn_upper, thread_id); - for (ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size()) { + for (size_t ptn = ptn_lower; ptn < ptn_upper; ptn+=VectorClass::size()) { VectorClass lh_ptn(0.0); // lh_ptn.load_a(&ptn_invar[ptn]); VectorClass *lh_cat = (VectorClass*)(_pattern_lh_cat + ptn*ncat_mix); @@ -2891,7 +2996,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, // compute likelihood per category if (SITE_MODEL) { VectorClass* eval_ptr = (VectorClass*) &eval[ptn*nstates]; - for (c = 0; c < ncat; c++) { + for (size_t c = 0; c < ncat; c++) { #ifdef KERNEL_FIX_STATES dotProductExp(eval_ptr, partial_lh_node, partial_lh_dad, cat_length[c], lh_cat[c]); #else @@ -2906,7 +3011,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, } } else { double *val_tmp = val; - for (c = 0; c < ncat_mix; c++) { + for (size_t c = 0; c < ncat_mix; c++) { #ifdef KERNEL_FIX_STATES dotProduct3Vec(val_tmp, partial_lh_node, partial_lh_dad, lh_cat[c]); #else @@ -2930,15 +3035,15 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, UBYTE sum_scale[ncat_mix]; UBYTE min_scale; - for (i = 0; i < VectorClass::size(); i++) { + for (size_t i = 0; i < VectorClass::size(); i++) { min_scale = sum_scale[0] = scale_dad[0] + scale_node[0]; - for (c = 1; c < ncat_mix; c++) { + for (size_t c = 1; c < ncat_mix; c++) { sum_scale[c] = scale_dad[c] + scale_node[c]; min_scale = min(min_scale, sum_scale[c]); } vc_min_scale_ptr[i] = min_scale; double *this_lh_cat = &_pattern_lh_cat[ptn*ncat_mix + i]; - for (c = 0; c < ncat_mix; c++) { + for (size_t c = 0; c < ncat_mix; c++) { if (sum_scale[c] == min_scale+1) { this_lh_cat[c*VectorClass::size()] *= SCALING_THRESHOLD; } else if (sum_scale[c] > min_scale+1) { @@ -2951,7 +3056,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, } sumVec(lh_cat, lh_ptn, ncat_mix); } else { - for (i = 0; i < VectorClass::size(); i++) { + for (size_t i = 0; i < VectorClass::size(); i++) { vc_min_scale_ptr[i] = dad_branch->scale_num[ptn+i] + node_branch->scale_num[ptn+i]; } } // if SAFE_NUMERIC @@ -2974,7 +3079,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, if (horizontal_or(vc_min_scale != 0.0)) { // some entries are rescaled double *lh_ptn_dbl = (double*)&lh_ptn; - for (i = 0; i < VectorClass::size(); i++) + for (size_t i = 0; i < VectorClass::size(); i++) if (vc_min_scale_ptr[i] != 0.0) lh_ptn_dbl[i] *= SCALING_THRESHOLD; } @@ -3010,7 +3115,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, // arbitrarily fix tree_lh if underflown for some sites if (!std::isfinite(tree_lh)) { tree_lh = 0.0; - for (ptn = 0; ptn < orig_nptn; ptn++) { + for (size_t ptn = 0; ptn < orig_nptn; ptn++) { if (!std::isfinite(_pattern_lh[ptn])) { _pattern_lh[ptn] = LOG_SCALING_THRESHOLD*4; // log(2^(-1024)) } @@ -3024,23 +3129,28 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, outError("+ASC not supported with robust phylo"); size_t sites = getAlnNSite(); size_t sites_drop = (int)((1.0-params->robust_phy_keep)*sites); - size_t site; - for (ptn = 0, site = 0; ptn < orig_nptn; ptn++) - for (i = 0; i < ptn_freq[ptn]; i++, site++) + size_t site = 0; + for (size_t ptn = 0; ptn < orig_nptn; ptn++) { + for (size_t i = 0; i < ptn_freq[ptn]; i++, site++) { _site_lh[site] = _pattern_lh[ptn]; + } + } ASSERT(site == sites); nth_element(_site_lh, _site_lh + sites_drop, _site_lh + sites); tree_lh = 0.0; - for (i = sites_drop; i < sites; i++) + for (size_t i = sites_drop; i < sites; i++) { tree_lh += _site_lh[i]; + } } else if (params->robust_median) { if (ASC_Holder || ASC_Lewis) outError("+ASC not supported with robust phylo"); size_t sites = getAlnNSite(); - size_t site; - for (ptn = 0, site = 0; ptn < orig_nptn; ptn++) - for (i = 0; i < ptn_freq[ptn]; i++, site++) + size_t site = 0; + for (size_t ptn = 0; ptn < orig_nptn; ptn++) { + for (int i = 0; i < ptn_freq[ptn]; i++, site++) { _site_lh[site] = _pattern_lh[ptn]; + } + } ASSERT(site == sites); nth_element(_site_lh, _site_lh + sites/2, _site_lh + sites); if ((sites & 1) == 1) { @@ -3058,15 +3168,15 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, size_t step_unobserved_ptns = model_factory->unobserved_ptns.size() / nstates; double *const_lh_next = const_lh + step_unobserved_ptns; for (int step = 1; step < nstates; step++, const_lh_next += step_unobserved_ptns) { - for (ptn = 0; ptn < orig_nptn; ptn+=VectorClass::size()) + for (size_t ptn = 0; ptn < orig_nptn; ptn+=VectorClass::size()) (VectorClass().load_a(&const_lh[ptn]) + VectorClass().load_a(&const_lh_next[ptn])).store_a(&const_lh[ptn]); } // cutoff the last entries if going beyond - for (ptn = orig_nptn; ptn < max_orig_nptn; ptn++) + for (size_t ptn = orig_nptn; ptn < max_orig_nptn; ptn++) const_lh[ptn] = 0.0; VectorClass sum_corr = 0.0; - for (ptn = 0; ptn < orig_nptn; ptn+=VectorClass::size()) { + for (size_t ptn = 0; ptn < orig_nptn; ptn+=VectorClass::size()) { VectorClass prob_variant = log(1.0 - VectorClass().load_a(&const_lh[ptn])); (VectorClass().load_a(&_pattern_lh[ptn]) - prob_variant).store_a(&_pattern_lh[ptn]); sum_corr += prob_variant*VectorClass().load_a(&ptn_freq[ptn]); @@ -3088,7 +3198,7 @@ double PhyloTree::computeLikelihoodBranchGenericSIMD(PhyloNeighbor *dad_branch, // _pattern_lh_cat[ptn] *= inv_const; prob_const = log(1.0 - prob_const); - for (ptn = 0; ptn < orig_nptn; ptn+=VectorClass::size()) + for (size_t ptn = 0; ptn < orig_nptn; ptn+=VectorClass::size()) (VectorClass().load_a(&_pattern_lh[ptn])-prob_const).store_a(&_pattern_lh[ptn]); // _pattern_lh[ptn] -= prob_const; tree_lh -= aln->getNSite()*prob_const; diff --git a/tree/phylotree.cpp b/tree/phylotree.cpp index 84a1d2908..61a0bb9ab 100644 --- a/tree/phylotree.cpp +++ b/tree/phylotree.cpp @@ -33,6 +33,7 @@ #include "upperbounds.h" #include "utils/MPIHelper.h" #include "utils/hammingdistance.h" +#include "utils/setcheck.h" #include "model/modelmixture.h" #include "phylonodemixlen.h" #include "phylotreemixlen.h" @@ -133,6 +134,7 @@ void PhyloTree::init() { vector_size = 0; safe_numeric = false; summary = nullptr; + isSummaryBorrowed = false; } PhyloTree::PhyloTree(Alignment *aln) : MTree(), CheckpointFactory() { @@ -204,73 +206,33 @@ void myPartitionsDestroy(partitionList *pl) { PhyloTree::~PhyloTree() { doneComputingDistances(); - if (nni_scale_num) - aligned_free(nni_scale_num); - nni_scale_num = NULL; - if (nni_partial_lh) - aligned_free(nni_partial_lh); - nni_partial_lh = NULL; - if (central_partial_lh) - aligned_free(central_partial_lh); - central_partial_lh = NULL; - if (central_scale_num) - aligned_free(central_scale_num); - central_scale_num = NULL; - - if (central_partial_pars) - aligned_free(central_partial_pars); - central_partial_pars = NULL; - - if(cost_matrix){ - aligned_free(cost_matrix); - cost_matrix = NULL; - } - - if (model_factory) - delete model_factory; + aligned_free(nni_scale_num); + aligned_free(nni_partial_lh); + aligned_free(central_partial_lh); + aligned_free(central_scale_num); + aligned_free(central_partial_pars); + aligned_free(cost_matrix); + + delete model_factory; model_factory = NULL; - if (model) - delete model; + delete model; model = NULL; - if (site_rate) - delete site_rate; + delete site_rate; site_rate = NULL; - if (_pattern_lh_cat) - aligned_free(_pattern_lh_cat); - _pattern_lh_cat = NULL; - if (_pattern_lh) - aligned_free(_pattern_lh); - _pattern_lh = NULL; - if (_site_lh) - aligned_free(_site_lh); - _site_lh = NULL; - //if (state_freqs) - // delete [] state_freqs; - if (theta_all) - aligned_free(theta_all); - theta_all = NULL; - if (buffer_scale_all) - aligned_free(buffer_scale_all); - buffer_scale_all = NULL; - if (buffer_partial_lh) - aligned_free(buffer_partial_lh); - buffer_partial_lh = NULL; - if (ptn_freq) - aligned_free(ptn_freq); - ptn_freq = NULL; - if (ptn_freq_pars) - aligned_free(ptn_freq_pars); - ptn_freq_pars = NULL; + aligned_free(_pattern_lh_cat); + aligned_free(_pattern_lh); + aligned_free(_site_lh); + aligned_free(theta_all); + aligned_free(buffer_scale_all); + aligned_free(buffer_partial_lh); + aligned_free(ptn_freq); + aligned_free(ptn_freq_pars); ptn_freq_computed = false; - if (ptn_invar) - aligned_free(ptn_invar); - ptn_invar = NULL; - if (dist_matrix) - delete[] dist_matrix; + aligned_free(ptn_invar); + delete[] dist_matrix; dist_matrix = NULL; - if (var_matrix) - delete[] var_matrix; + delete[] var_matrix; var_matrix = NULL; if (pllPartitions) @@ -283,6 +245,9 @@ PhyloTree::~PhyloTree() { pllPartitions = NULL; pllAlignment = NULL; pllInst = NULL; + if (!isSummaryBorrowed) { + delete summary; + } summary = nullptr; } @@ -360,27 +325,57 @@ void PhyloTree::copyTree(MTree *tree, string &taxa_set) { setAlignment(aln); } -void PhyloTree::copyPhyloTree(PhyloTree *tree) { +void PhyloTree::copyPhyloTree(PhyloTree *tree, bool borrowSummary) { MTree::copyTree(tree); if (!tree->aln) return; setAlignment(tree->aln); + if (borrowSummary && summary!=tree->summary && tree->summary!=nullptr) { + if (!isSummaryBorrowed) { + delete summary; + } + summary = tree->summary; + isSummaryBorrowed = (summary!=nullptr); + } } -void PhyloTree::copyPhyloTreeMixlen(PhyloTree *tree, int mix) { +void PhyloTree::copyPhyloTreeMixlen(PhyloTree *tree, int mix, bool borrowSummary) { if (tree->isMixlen()) { ((PhyloTreeMixlen*)tree)->cur_mixture = mix; } - copyPhyloTree(tree); + copyPhyloTree(tree, borrowSummary); if (tree->isMixlen()) { ((PhyloTreeMixlen*)tree)->cur_mixture = -1; } } +#define FAST_NAME_CHECK 1 void PhyloTree::setAlignment(Alignment *alignment) { aln = alignment; - bool err = false; + //double checkStart = getRealTime(); size_t nseq = aln->getNSeq(); + bool err = false; +#if FAST_NAME_CHECK + map mapNameToNode; + getMapOfTaxonNameToNode(nullptr, nullptr, mapNameToNode); + for (size_t seq = 0; seq < nseq; seq++) { + string seq_name = aln->getSeqName(seq); + auto it = mapNameToNode.find(seq_name); + if (it==mapNameToNode.end()) { + string str = "Alignment sequence "; + str += seq_name; + str += " does not appear in the tree"; + err = true; + outError(str, false); + } else { + (*it).second->id = seq; + mapNameToNode.erase(it); + } + } +#else + //For each sequence, find the correponding node, via + //the map (if there is one), tag it, and remove it from + //the map. for (size_t seq = 0; seq < nseq; seq++) { string seq_name = aln->getSeqName(seq); Node *node = findLeafName(seq_name); @@ -395,14 +390,22 @@ void PhyloTree::setAlignment(Alignment *alignment) { node->id = seq; } } +#endif if (err) { printTree(cout, WT_NEWLINE); outError("Tree taxa and alignment sequence do not match (see above)"); } - if (rooted) { - ASSERT(root->name == ROOT_NAME); - root->id = nseq; +#if FAST_NAME_CHECK + //Since every sequence in the alignment has had its + //corresponding node tagged with an id, and that node's + //entry has been removed from the map, it follows: + //any node still referenced from the map does NOT + //correspond to a sequence in the alignment. + for (auto it = mapNameToNode.begin(); it != mapNameToNode.end(); ++it) { + outError((string)"Tree taxon " + it->first + " does not appear in the alignment", false); + err = true; } +#else StrVector taxname; getTaxaName(taxname); for (StrVector::iterator it = taxname.begin(); it != taxname.end(); it++) @@ -410,7 +413,20 @@ void PhyloTree::setAlignment(Alignment *alignment) { outError((string)"Tree taxon " + (*it) + " does not appear in the alignment", false); err = true; } - if (err) outError("Tree taxa and alignment sequence do not match (see above)"); +#endif + if (err) { + outError("Tree taxa and alignment sequence do not match (see above)"); + } + if (rooted) { + ASSERT(root->name == ROOT_NAME); + root->id = aln->getNSeq(); + } + /* + if (verbose_mode >= VB_MED) { + cout << "Alignment namecheck took " << (getRealTime()-checkStart) + << " sec (of wall-clock time)" << endl; + } + */ } void PhyloTree::setRootNode(const char *my_root, bool multi_taxa) { @@ -919,41 +935,20 @@ void PhyloTree::initializeAllPartialLh() { } void PhyloTree::deleteAllPartialLh() { - - if (central_partial_lh) { - aligned_free(central_partial_lh); - } - if (central_scale_num) { - aligned_free(central_scale_num); - } - if (central_partial_pars) - aligned_free(central_partial_pars); - - if (nni_scale_num) - aligned_free(nni_scale_num); - nni_scale_num = NULL; - if (nni_partial_lh) - aligned_free(nni_partial_lh); - nni_partial_lh = NULL; - - if (ptn_invar) - aligned_free(ptn_invar); - if (ptn_freq) - aligned_free(ptn_freq); - if (ptn_freq_pars) - aligned_free(ptn_freq_pars); - if (theta_all) - aligned_free(theta_all); - if (buffer_scale_all) - aligned_free(buffer_scale_all); - if (buffer_partial_lh) - aligned_free(buffer_partial_lh); - if (_pattern_lh_cat) - aligned_free(_pattern_lh_cat); - if (_pattern_lh) - aligned_free(_pattern_lh); - if (_site_lh) - aligned_free(_site_lh); + aligned_free(central_partial_lh); + aligned_free(central_scale_num); + aligned_free(central_partial_pars); + aligned_free(nni_scale_num); + aligned_free(nni_partial_lh); + aligned_free(ptn_invar); + aligned_free(ptn_freq); + aligned_free(ptn_freq_pars); + aligned_free(theta_all); + aligned_free(buffer_scale_all); + aligned_free(buffer_partial_lh); + aligned_free(_pattern_lh_cat); + aligned_free(_pattern_lh); + aligned_free(_site_lh); central_partial_lh = NULL; central_scale_num = NULL; central_partial_pars = NULL; @@ -1722,7 +1717,6 @@ int PhyloTree::computePatternCategories(IntVector *pattern_ncat) { delete [] id_mixture; delete [] sorted_lh_mixture; delete [] lh_mixture; -// delete [] cat_prob; return num_best_mixture; } @@ -2568,9 +2562,10 @@ void PhyloTree::optimizePatternRates(DoubleVector &pattern_rates) { ptn_id.push_back(ptn); paln->extractPatterns(aln, ptn_id); PhyloTree *tree = new PhyloTree; - tree->copyPhyloTree(this); + tree->copyPhyloTree(this, false); //Local alignment, so tree can't "borrow" the summary of this tree->setParams(params); tree->setAlignment(paln); + tree->prepareToComputeDistances(); tree->sse = sse; tree->num_threads = 1; // initialize model @@ -2580,7 +2575,7 @@ void PhyloTree::optimizePatternRates(DoubleVector &pattern_rates) { tree->optimizeTreeLengthScaling(MIN_SITE_RATE, pattern_rates[ptn], MAX_SITE_RATE, 0.0001); tree->setModelFactory(NULL); - + tree->doneComputingDistances(); delete tree; delete paln; } @@ -3107,8 +3102,12 @@ void PhyloTree::prepareToComputeDistances() { for (threads -= distanceProcessors.size(); 0 < threads; --threads) { distanceProcessors.emplace_back(new AlignmentPairwise(this)); } + if (summary!=nullptr && !isSummaryBorrowed) { + delete summary; + summary = nullptr; + } if (params->experimental) { - summary = new AlignmentSummary(aln, true); + summary = new AlignmentSummary(aln, true, true); summary->constructSequenceMatrix(false); } } @@ -3118,22 +3117,37 @@ bool PhyloTree::hasMatrixOfConvertedSequences() const { } size_t PhyloTree::getConvertedSequenceLength() const { + if (summary==nullptr) { + return 0; + } return summary->sequenceLength; } const char* PhyloTree::getConvertedSequenceByNumber(int seq1) const { + if (summary==nullptr || summary->sequenceMatrix==nullptr) { + return nullptr; + } return summary->sequenceMatrix + seq1 * summary->sequenceLength; } const int* PhyloTree::getConvertedSequenceFrequencies() const { + if (summary==nullptr) { + return nullptr; + } return summary->siteFrequencies.data(); } const int* PhyloTree::getConvertedSequenceNonConstFrequencies() const { + if (summary==nullptr) { + return nullptr; + } return summary->nonConstSiteFrequencies.data(); } int PhyloTree::getSumOfFrequenciesForSitesWithConstantState(int state) const { + if (summary==nullptr) { + return 0; + } return summary->getSumOfConstantSiteFrequenciesForState(state); } @@ -3154,7 +3168,9 @@ void PhyloTree::doneComputingDistances() { delete (*it); } distanceProcessors.clear(); - delete summary; + if (!isSummaryBorrowed) { + delete summary; + } summary = nullptr; } @@ -3369,7 +3385,7 @@ double PhyloTree::computeDist_Experimental(double *dist_mat, double *var_mat) { } } EX_TRACE("Summarizing..."); - AlignmentSummary s(aln, false); + AlignmentSummary s(aln, false, false); int maxDistance = 0; EX_TRACE("Summarizing found " << s.sequenceLength @@ -3511,7 +3527,7 @@ double PhyloTree::computeDist(Params ¶ms, Alignment *alignment, double* &dis double begin_time = getRealTime(); longest_dist = (params.experimental) ? computeDist_Experimental(dist_mat, var_mat) - : computeDist(dist_mat, var_mat); + : computeDist(dist_mat, var_mat); //DISABLED if (verbose_mode >= VB_MED) { cout << "Distance calculation time: " << getRealTime() - begin_time << " seconds" << endl; @@ -3531,7 +3547,7 @@ double PhyloTree::computeObsDist(double *dist_mat) { size_t nseqs = aln->getNSeq(); double longest_dist = 0.0; #ifdef _OPENMP - #pragma omp parallel for + #pragma omp parallel for schedule(dynamic) #endif for (size_t seq1 = 0; seq1 < nseqs; ++seq1) { size_t pos = seq1*nseqs; @@ -3625,7 +3641,7 @@ void PhyloTree::computeBioNJ(Params ¶ms) { treeBuilder->constructTree(dist_file, bionj_file); if (verbose_mode >= VB_MED) { cout << "Constructing " << treeBuilder->getName() << " tree" - << " (from distance file " << dist_file << ") took" + << " (from distance file " << dist_file << ") took " << (getRealTime()-start_time) << " sec." << endl; } } @@ -4165,8 +4181,6 @@ NNIMove PhyloTree::getBestNNIForBran(PhyloNode *node1, PhyloNode *node2, NNIMove mem_slots.eraseSpecialNei(); -// aligned_free(new_partial_lh); - // restore the length of 4 branches around node1, node2 FOR_NEIGHBOR(node1, node2, it) (*it)->setLength((*it)->node->findNeighbor(node1)); @@ -4182,9 +4196,7 @@ NNIMove PhyloTree::getBestNNIForBran(PhyloNode *node1, PhyloNode *node2, NNIMove } else { res = nniMoves[1]; } - if (newNNIMoves) { - delete [] nniMoves; - } + delete [] nniMoves; return res; } diff --git a/tree/phylotree.h b/tree/phylotree.h index 30a8b251f..b75d8769c 100644 --- a/tree/phylotree.h +++ b/tree/phylotree.h @@ -111,7 +111,17 @@ inline T *aligned_alloc(size_t size) { return (T*)mem; } -inline void aligned_free(void* mem) { +template T* ensure_aligned_allocated(T* &ptr, size_t size) { + if (ptr==nullptr) { + ptr = aligned_alloc(size); + } + return ptr; +} + +template void aligned_free(T* & mem) { + if (mem==nullptr) { + return; + } #if defined WIN32 || defined _WIN32 || defined __WIN32__ #if (defined(__MINGW32__) || defined(__clang__)) && defined(BINARY32) __mingw_aligned_free(mem); @@ -121,6 +131,7 @@ inline void aligned_free(void* mem) { #else free(mem); #endif + mem = nullptr; } @@ -416,16 +427,18 @@ class PhyloTree : public MTree, public Optimization, public CheckpointFactory { copy the phylogenetic tree structure into this tree, designed specifically for PhyloTree. So there is some distinction with copyTree. @param tree the tree to copy + @param borrowSummary true if the alignment summary of the other tree is to be "borrowed" */ - void copyPhyloTree(PhyloTree *tree); + void copyPhyloTree(PhyloTree *tree, bool borrowSummary); /** copy the phylogenetic tree structure into this tree, designed specifically for PhyloTree. So there is some distinction with copyTree. @param tree the tree to copy @param mix mixture ID of branch lengths + @param borrowSummary true if the alignment summary of the other tree is to be "borrowed" */ - virtual void copyPhyloTreeMixlen(PhyloTree *tree, int mix); + virtual void copyPhyloTreeMixlen(PhyloTree *tree, int mix, bool borrowSummary); /** @@ -779,9 +792,9 @@ class PhyloTree : public MTree, public Optimization, public CheckpointFactory { UINT *tip_partial_pars; bool ptn_freq_computed; - - /** site log-likelihood buffer for robust phylogeny idea */ - double *_site_lh; + + /** site log-likelihood buffer for robust phylogeny idea */ + double *_site_lh; /** vector size used by SIMD kernel */ size_t vector_size; @@ -2348,6 +2361,7 @@ class PhyloTree : public MTree, public Optimization, public CheckpointFactory { std::vector distanceProcessors; AlignmentSummary* summary; + bool isSummaryBorrowed; }; diff --git a/tree/phylotreesse.cpp b/tree/phylotreesse.cpp index 787c84c89..f1d1c1f0b 100644 --- a/tree/phylotreesse.cpp +++ b/tree/phylotreesse.cpp @@ -259,30 +259,27 @@ void PhyloTree::computeTipPartialLikelihood() { int nstates = aln->num_states; size_t nseq = aln->getNSeq(); ASSERT(vector_size > 0); + + #ifdef _OPENMP #pragma omp parallel for schedule(static) #endif for (int nodeid = 0; nodeid < nseq; nodeid++) { + auto stateRow = getConvertedSequenceByNumber(nodeid); int i, x, v; double *partial_lh = tip_partial_lh + tip_block_size*nodeid; size_t ptn; for (ptn = 0; ptn < nptn; ptn+=vector_size, partial_lh += nstates*vector_size) { -// int state[vector_size]; -// for (v = 0; v < vector_size; v++) { -// if (ptn+v < nptn) -// state[v] = aln->at(ptn+v)[nodeid]; -// else -// state[v] = aln->STATE_UNKNOWN; -// } - double *inv_evec = &model->getInverseEigenvectors()[ptn*nstates*nstates]; for (v = 0; v < vector_size; v++) { int state = 0; - if (ptn+v < nptn) - state = aln->at(ptn+v)[nodeid]; - // double *partial_lh = node_partial_lh + ptn*nstates; -// double *inv_evec = models->at(ptn)->getInverseEigenvectors(); - + if (ptn+v < nptn) { + if (stateRow!=nullptr) { + state = stateRow[ptn+v]; + } else { + state = aln->at(ptn+v)[nodeid]; + } + } if (state < nstates) { for (i = 0; i < nstates; i++) partial_lh[i*vector_size+v] = inv_evec[(i*nstates+state)*vector_size+v]; @@ -290,7 +287,6 @@ void PhyloTree::computeTipPartialLikelihood() { // special treatment for unknown char for (i = 0; i < nstates; i++) { double lh_unknown = 0.0; -// double *this_inv_evec = inv_evec + i*nstates; for (x = 0; x < nstates; x++) lh_unknown += inv_evec[(i*nstates+x)*vector_size+v]; partial_lh[i*vector_size+v] = lh_unknown; @@ -450,7 +446,6 @@ void PhyloTree::computeTipPartialLikelihood() { double *all_inv_evec = model->getInverseEigenvectors(); ASSERT(all_inv_evec); - for (state = 0; state < nstates; state++) { double *this_tip_partial_lh = &tip_partial_lh[state*nstates*nmixtures]; for (m = 0; m < nmixtures; m++) {