Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

now using name decenttree.cpp #9

Merged
merged 7 commits into from
May 7, 2024
Merged

now using name decenttree.cpp #9

merged 7 commits into from
May 7, 2024

Conversation

NicolaDM
Copy link
Contributor

No description provided.

tamaramagdr pushed a commit that referenced this pull request Oct 7, 2021
optimisation (~2x) & tree search (~2x) by "serialising" their
memory access patterns to improve spatial and temporal locality
of reference.

1. PhyloTree alignment summaries may now be "borrowed" from another tree
   that has the same alignment (the relevant member is isSummaryBorrowed;
   if it is true, this instance doesn't own the summary, it is only
   "Borrowing" a reference to a summary owned by another tree).
2. PhyloTree member functions copyPhyloTree and copyPhyloTreeMixlen take an
   extra parameter indicating whether the copy is to "borrow" a copy of the
   alignment summary of the original (if it has one).  This matters a lot
   for ratefree.cpp and +R free rate models, and modelmixture.cpp!
   The temporary copies of the phylo tree that are used during parameter
   Optimization can now re-use the AlignmentSummary of the original; which
   means they can "linearise" their memory access to sites, when they are
   Optimising branch lengths (see changes listed below, e.g. #4, #5, #6, #7).
3. PhyloTree::setAlignment does its "name check" a different way (rather
   than finding each sequence by name by scanning the tree, if asks
   MTree::getMapOfTaxonNameToNode for a name to leaf node map, and checks
   the array of sequence names against the map (updating the id on the
   node for each hit).  The new approach is equivalent but is much faster,
   O(n.ln(n)) rather than O(n^2).  This speeds up tree loads markedly
   (particularly for large trees), but it matters most for free rate
   parameter optimization (on middling inputs this was a significant
   factor: about ~10% of parameter optimization time).  This can be turned
   off by changing the FAST_NAME_CHECK symbol.
4. IQTree::optimizeModelParameters now calls prepareToComputeDistances()
   (So that AlignmentSummary's matrix of converted sequences) will be
   available (to be borrowed, via calls to PhyloTree::copyPhyloTree (see
   change # 2 above, and changes #5 through #7 below).
   Likewise IQTree::doNNISearch (so changes #5, #8 help tree searches too).
5. AlignmentPairwise::computeFunction and  AlignmentPairwise::computeFuncDerv(
   can now make use of AlignmentSummary's "Matrix of converted sequences"
   (if it is available) via PhyloTree's accessor methods,
   e.g. PhyloTree::getConvertedSequenceByNumber().
   For this to work as expected, it's necessary for callers to ask
   AlignmentSummary to construct that matrix *including* even sites where
   there is no variety at all (added the keepBoringSites parameter on the
   AlignmentSummary constructor for this).
6. RateMeyerDiscrete::computeFunction and RateMeyerDiscrete::computeFuncDerv
   likewise. And RateMeyerDiscrete::normalizeRates can make use of the "flat"
   frequency array exposed by PhyloTree::getConvertedSequenceFrequencies() too.
7. PhyloTree::computePartialLikelihoodGenericSIMD (in phylokernelnew.h)
   makes use of the matrix of converted sequences (if one is available), in
   about six (!) different places.  In terms of actual effect, this is the
   most important change in this commit, but it needs changes #1, #2, and #4
   committed too, if it is to have any effect.  This change speeds up both
   parameter optimisation and tree searching significantly.
8. As well as inv_eigenvectors, there is now an iv_eigenvectors_transposed
   (Using the transpose makes for some faster multiplications; see change #9
   listed below). ModelMarkov::calculateSquareMatrixTranspose is used to
   calculate the transpose of the inverse eigen vectors.  Unpleasant
   consequence: ModelMarkov::update_eigen_pointers has to take an extra
   parameter.  Keeping this additional member set correctly is the only
   Thing that forced changes to modelpomomixture.cpp (and .h), modelset.cpp,
   and modelsubst.h.
9. ModelMarkov::computeTransMatrix and ModelMarkov::computeTransDerv
   now use (a) calculateExponentOfScalarMultiply and
   (b) aTimesDiagonalBTimesTransposeOfC to calculate transition matrices
   (This is quite a bit faster than the Eigen code, since it doesn't
   bother to construct the diagonal matrix B.asDiagonal()...).
   (a) and (b) and the supporting functions, calculateHadamardProduct
   And dotProduct, are (for now) members of ModelMarkov.
10.Minor tweaks to vector processing code in phylokernelnew.h:
   (a) dotProductVec hand-unrolled treatment of the V array;
   (b) dotProductPairAdd treated the last item (in A and B) as the
       special case, when handling an odd number of items.
       Possibly the treatment of the AD and BD arrays should be
       hand-unrolled here, too, but I haven't tried that yet.
   (c) dotProductTriple (checking for odd uses & rather than %) (faster!)
11.The aligned_free free function (from phylotree.h ?!) does the "pointer
   Null?" check itself, and (because it takes a T*& rather than a T*),
   can itself set the pointer to nullptr.  This means that client code that
   used to go... if (x) { aligned_free(x); x=NULL; } ... can now be
   simplified to just... aligned_free(x);
12.Next to it (in phylotree.h), there is now an ensure_aligned_allocated
   method.  That lets you replace code like ... this:
        if (!eigenvalues)
            eigenvalues = aligned_alloc<double>(num_states);
   With:
        ensure_aligned_allocated(eigenvalues,  num_states);
   which is,
   I reckon, more readable.
13.In many places where there was code of the form... if (x) { delete x; }
   I have replaced it with delete x (likewise delete [] x).  delete always
   checks for null (it's required to, that's in the C++ standards), and
   "Rolling your own check" merely devalues the check that delete will
   later do! I've made similar "don't bother to check for null" changes
   in some other files, that I haven't included in this commit (since there
   aren't any *material* changes to anything in those files).
tamaramagdr pushed a commit that referenced this pull request Oct 7, 2021
1. computeMLDistances no longer writes a distance file (it was
   usually written *again* in computeBioNJ; see change #2).
2. runTreeConstruction can no longer assume that the distance
   file has been written by computeMLDistances, so (if
   iqtree->computeBioNJ has not been called, it must write it,
   even if params.user_file was false, via a call to
   iqtree->printDistanceFile).
3. PhyloTree now has a num_packets member (which tracks, how
   many packets to divide work into: it can be the same as
   num_threads, but is generally more; at present by a factor
   of 2).  Member functions such as getBufferPartialLhSize
   must allocate per packet rather than per thread.  See in
   particular changes #9, #10 and #11.
4. Removed a little commented-out code from PhyloTree.cpp
   (And moved for-loop iteration variables that could've
    been in-loop, but weren't in-loop, in lots of places).
   (Likewise in phylotreesse.cpp).
5. Removed redundant assignments to nullptr (particularly in
   PhyloTree::deleteAllPartialLh); these aren't needed now
   Because aligned_free sets the pointer to nullptr for you.
6. Client code that set IQTree::num_threads directly now does
   so via setNumThreads (e.g. in phylotesting.cpp)
   (Also in PhyloTree::optimizePatternRates) (because
   setNumThreads also sets num_packets).  For now,
   num_packets is set to 2*num_threads (see change #9).
7. Removed dead pointer adjustments in the "any size" case in
   PhyloTree::computePartialParsimonyFastSIMD.  These had been
   left over from before that member function was vectorised
   (The pointers are recalculated at the start of the next
   Iteration of the loop, so adjusting them is a waste of time).
   (Hopefully the compiler was optimizing the adjustments away).
8. Fully unrolled the size 4 case in productVecMat
   (In phylokernelnew.h).
9. computeBounds chooses sizes for blocks of work
   (Based on the number of packets of work as well as
   the number of threads to be allocated).  For now, it is
   assumed that the number of packets of work is divisible
   by the number of threads.
10. PhyloTree::computeTraversalInfo calculates buffer sizes
    Required in terms of num_packets rather than num_threads.
11. #pragma omp parallel for ... and corresponding for loops
    are now for packets of work not threads.
    (a) PhyloTree::computeTraversalInfo
    (b) PhyloTree::computeLikelihoodDervGenericSIMD (*)
        (Two separate #pragma omp parallel for blocks)
    (c) PhyloTree::computeLikelihoodBranchGenericSIMD (*)
    (d) PhyloTree::computeLikelihoodFromBufferGenericSIMD (*)
    (e) PhyloTree::computeLikelihoodDervMixlenGenericSIMD (*)
    (f) PhyloTree::computeNonrevLikelihoodDervGenericSIMD (*)
        (Two separate #pragma omp parallel for blocks)
    (g) PhyloTree::computeNonrevLikelihoodBranchGenericSIMD (*)
        (Two separate #pragma omp parallel for blocks)

    The ones marked with (*) now use reductions (aimed at
    double) where possible, rather than #omp critical section.
    I've got rid of the private(pin,i,c) stuff by declaring
    Those variables local to the loops that use them.
    (This means doing horizontal_add per-packet rather than
    after all the packets are processed).
    They all use dynamic (rather than static) scheduling.
tamaramagdr pushed a commit that referenced this pull request Oct 7, 2021
was necessary (see #2 through #8 and particularly #5 below), and also
drafted some additional "progress-reporting" (see #9 through #11):
1. If -mlnj-only is found on the command-line, Params::compute_ml_tree_only
   will be set to true (in parseArg(), in utils/tools.cpp).
2. initializeParams doesn't call computeInitialTree if compute_ml_tree_only
   is set to true.
3. You can't set the root of a tree (if you don't yet have one), a bit later
   in the same function (and also in IQTree::initSettings).
4. Added PhyloTree::ensureNumberOfThreadsIsSet (and updated repetitive code
   that was doing what it does, in several other places).  This forced some
   updates in other files, such as main/phylotesting.cpp.
5. Added PhyloTree::ensureModelParametersAreSet (as the same steps need to
   be carried out somewhat later if there isn't an initial tree before
   ML distances are calculated). It returns a tree string.
6. In runTreeConstruction, when compute_ml_tree_only is set, negative
   branches are resolved, and #4 and #5 are called only AFTER the tree
   has been constructed.
7. In IQTree::initCandidateTreeSet the tree mightn't be a parsimony tree
   (I think if you've combined -nt AUTO and --mlnj-only) as such, but there
   will be *a* tree.  The list of cases wasn't exhaustive any more.
8. Added a distanceFileWritten member variable and a getDistanceFileWritten
   Member function to PhyloTree.
9. (This and the following changes are progress reporting changes).
   Added member functions for progress reporting to PhyloTree:
   (a) initProgress  (pushes where you are on a stack, and starts
                      reporting progress, if there's now one level
                      of progress reporting on the stack)
   (b) trackProgress (bumps up progress if progress stack depth is: 1)
   (c) hideProgress  (called before you write log messages to cut)
   (d) showProgress  (called again after)
   (e) doneProgress  (pops, and stops reporting progress, if the last
                      level of progress reporting was just popped)
   The supporting member variables are progressStackDepth and progress.
9. IQTree::optimizeNNI uses the functions added in change #9 to report
   Progress (problem here is that MAXSTEPS is a rather "high" guess
   (For n sequences it is ~2n, when the best guess for how many iterations
   There will be, with parallel NNIs, is on the order of ~p where p is the
   worst-case "tip-to-tip" path length of the tree - probably a lot less.
10.PhyloTree::testNumThreads also uses the functions added in change#9 to
   Report how many threads it has tried (though, for now, it badly
   over-reports how long it thinks it will take) (because it thinks it will
   do max_procs iterations and each will take as long as the last, but,
   Really, it'll do max_procs/2, or so, and they go faster and faster as
   there are more threads in use in later steps - one more each step).
11.PhyloTree::optimizeAllBranches reports progress (via the functions
   added in change#9).  Normally it reports progress during parameter
   optimisation (because I haven't written "higher-level" progress
   reporting for that yet).

There are some potential issues though:
1. The special-case code for dealing with "+I+G" rates doesn't yet have
   a counterpart when compute_ml_tree_only is set (in runTreeConstruction).
2. Likewise, the code for when (params.lmap_num_quartets >= 0)
   (No counterpart when compute_ml_tree_only is set, yet) (this too
    is in runTreeConstruction).
   (I haven't figured out how to test the "counterpart" versions of those
    yet, which is why I haven't written them)
3. If you pass -nt AUTO I'm not sure how many threads the NJ (or whatever)
   step will use (I think it's all of them), and the ML distance
   calculations also "use all the threads" (because the thread count's not
   set when that code runs either).  Both parallelise... well... but I'm
   not so sure it's a good idea that it hogs all the CPU cores like that.
@bqminh
Copy link
Member

bqminh commented May 3, 2024

@NicolaDM There are conflicts in modelunrest.cpp and .h. Please have a look above. Do you remember what you changed? (so it helps to review the merge).

@NicolaDM
Copy link
Contributor Author

NicolaDM commented May 3, 2024

Sorry Minh, I don't remember much about this.

Looking at my commits it looks like I added a function writeInfo() to both .cpp and .h files, and I modified
ModelUnrest::ModelUnrest() in the .cpp file to allow the reading input UNREST rates.

@bqminh
Copy link
Member

bqminh commented May 6, 2024

@trongnhanuit Am I right that you change ModelUnrest to read the parameters? It looks like both you and Nicola changed this part of the class, causing conflicts...

@trongnhanuit
Copy link
Member

trongnhanuit commented May 6, 2024 via email

@bqminh bqminh merged commit e924d5c into iqtree:master May 7, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants