Skip to content

Commit

Permalink
Added support for the -mlnj-only parameter - for which some refactoring
Browse files Browse the repository at this point in the history
was necessary (see #2 through #8 and particularly #5 below), and also
drafted some additional "progress-reporting" (see #9 through #11):
1. If -mlnj-only is found on the command-line, Params::compute_ml_tree_only
   will be set to true (in parseArg(), in utils/tools.cpp).
2. initializeParams doesn't call computeInitialTree if compute_ml_tree_only
   is set to true.
3. You can't set the root of a tree (if you don't yet have one), a bit later
   in the same function (and also in IQTree::initSettings).
4. Added PhyloTree::ensureNumberOfThreadsIsSet (and updated repetitive code
   that was doing what it does, in several other places).  This forced some
   updates in other files, such as main/phylotesting.cpp.
5. Added PhyloTree::ensureModelParametersAreSet (as the same steps need to
   be carried out somewhat later if there isn't an initial tree before
   ML distances are calculated). It returns a tree string.
6. In runTreeConstruction, when compute_ml_tree_only is set, negative
   branches are resolved, and #4 and #5 are called only AFTER the tree
   has been constructed.
7. In IQTree::initCandidateTreeSet the tree mightn't be a parsimony tree
   (I think if you've combined -nt AUTO and --mlnj-only) as such, but there
   will be *a* tree.  The list of cases wasn't exhaustive any more.
8. Added a distanceFileWritten member variable and a getDistanceFileWritten
   Member function to PhyloTree.
9. (This and the following changes are progress reporting changes).
   Added member functions for progress reporting to PhyloTree:
   (a) initProgress  (pushes where you are on a stack, and starts
                      reporting progress, if there's now one level
                      of progress reporting on the stack)
   (b) trackProgress (bumps up progress if progress stack depth is: 1)
   (c) hideProgress  (called before you write log messages to cut)
   (d) showProgress  (called again after)
   (e) doneProgress  (pops, and stops reporting progress, if the last
                      level of progress reporting was just popped)
   The supporting member variables are progressStackDepth and progress.
9. IQTree::optimizeNNI uses the functions added in change #9 to report
   Progress (problem here is that MAXSTEPS is a rather "high" guess
   (For n sequences it is ~2n, when the best guess for how many iterations
   There will be, with parallel NNIs, is on the order of ~p where p is the
   worst-case "tip-to-tip" path length of the tree - probably a lot less.
10.PhyloTree::testNumThreads also uses the functions added in change#9 to
   Report how many threads it has tried (though, for now, it badly
   over-reports how long it thinks it will take) (because it thinks it will
   do max_procs iterations and each will take as long as the last, but,
   Really, it'll do max_procs/2, or so, and they go faster and faster as
   there are more threads in use in later steps - one more each step).
11.PhyloTree::optimizeAllBranches reports progress (via the functions
   added in change#9).  Normally it reports progress during parameter
   optimisation (because I haven't written "higher-level" progress
   reporting for that yet).

There are some potential issues though:
1. The special-case code for dealing with "+I+G" rates doesn't yet have
   a counterpart when compute_ml_tree_only is set (in runTreeConstruction).
2. Likewise, the code for when (params.lmap_num_quartets >= 0)
   (No counterpart when compute_ml_tree_only is set, yet) (this too
    is in runTreeConstruction).
   (I haven't figured out how to test the "counterpart" versions of those
    yet, which is why I haven't written them)
3. If you pass -nt AUTO I'm not sure how many threads the NJ (or whatever)
   step will use (I think it's all of them), and the ML distance
   calculations also "use all the threads" (because the thread count's not
   set when that code runs either).  Both parallelise... well... but I'm
   not so sure it's a good idea that it hogs all the CPU cores like that.
  • Loading branch information
JamesBarbetti committed Jul 30, 2020
1 parent 89419a0 commit d84cc2e
Show file tree
Hide file tree
Showing 9 changed files with 334 additions and 204 deletions.
290 changes: 137 additions & 153 deletions main/phyloanalysis.cpp

Large diffs are not rendered by default.

27 changes: 3 additions & 24 deletions main/phylotesting.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -582,15 +582,7 @@ string computeFastMLTree(Params &params, Alignment *aln,
}

iqtree->getModelFactory()->restoreCheckpoint();

#ifdef _OPENMP
if (num_threads <= 0) {
num_threads = iqtree->testNumThreads();
omp_set_num_threads(num_threads);
} else
iqtree->warnNumThreads();
#endif

iqtree->ensureNumberOfThreadsIsSet(nullptr);
iqtree->initializeAllPartialLh();
double saved_modelEps = params.modelEps;
params.modelEps = params.modelfinder_eps;
Expand Down Expand Up @@ -1581,13 +1573,7 @@ string CandidateModel::evaluate(Params &params,
iqtree->saveCheckpoint();
}

#ifdef _OPENMP
if (num_threads <= 0) {
num_threads = iqtree->testNumThreads();
omp_set_num_threads(num_threads);
} else
iqtree->warnNumThreads();
#endif
iqtree->ensureNumberOfThreadsIsSet(nullptr);

runTreeReconstruction(params, iqtree);
new_logl = iqtree->computeLikelihood();
Expand All @@ -1610,14 +1596,7 @@ string CandidateModel::evaluate(Params &params,
if (verbose_mode >= VB_MED)
cout << "Optimizing model " << getName() << endl;

#ifdef _OPENMP
if (num_threads <= 0) {
num_threads = iqtree->testNumThreads();
omp_set_num_threads(num_threads);
} else
iqtree->warnNumThreads();
#endif

iqtree->ensureNumberOfThreadsIsSet(nullptr);
iqtree->initializeAllPartialLh();

for (int step = 0; step < 2; step++) {
Expand Down
86 changes: 71 additions & 15 deletions tree/iqtree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#include "phylosupertree.h"
#include "phylosupertreeplen.h"
#include "model/partitionmodelplen.h"
#include "model/modelfactorymixlen.h"
#include "mexttree.h"
#include "utils/timeutil.h"
#include "model/modelmarkov.h"
Expand Down Expand Up @@ -318,7 +319,9 @@ void IQTree::initSettings(Params &params) {
// max_candidate_trees = params.max_candidate_trees;
// if (max_candidate_trees == 0)
// max_candidate_trees = aln->getNSeq() * params.step_iterations;
setRootNode(params.root);
if (!params.compute_ml_tree_only) {
setRootNode(params.root);
}

size_t i;

Expand Down Expand Up @@ -655,7 +658,6 @@ void IQTree::computeInitialTree(LikelihoodKernel kernel) {
if (params->write_init_tree) {
out_file += ".init_tree";
printTree(out_file.c_str(), WT_NEWLINE);
// printTree(getTreeString().c_str(), WT_NEWLINE);
}
}

Expand Down Expand Up @@ -741,26 +743,25 @@ void IQTree::initCandidateTreeSet(int nParTrees, int nNNITrees) {
curParsTree = string(pllInst->tree_string);
PhyloTree::readTreeStringSeqName(curParsTree);
wrapperFixNegativeBranch(true);
curParsTree = getTreeString();
} else if (params->start_tree == STT_RANDOM_TREE) {
generateRandomTree(YULE_HARDING);
wrapperFixNegativeBranch(true);
if (rooted) {
rooted = false;
convertToRooted();
}
curParsTree = getTreeString();
} else if (params->start_tree == STT_PARSIMONY) {
/********* Create parsimony tree using IQ-TREE *********/
#ifdef _OPENMP
PhyloTree::readTreeString(pars_trees[treeNr-1]);
curParsTree = getTreeString();
#else
computeParsimonyTree(NULL, aln, randstream);
curParsTree = getTreeString();
#endif
} else {
//Use the tree we've already got!
}

curParsTree = getTreeString();

int pos = addTreeToCandidateSet(curParsTree, -DBL_MAX, false, MPIHelper::getInstance().getProcessID());
// if a duplicated tree is generated, then randomize the tree
if (pos == -1) {
Expand Down Expand Up @@ -2103,6 +2104,30 @@ string IQTree::optimizeModelParameters(bool printInfo, double logl_epsilon) {
return newTree;
}

string IQTree::ensureModelParametersAreSet(double initEpsilon) {
string initTree;
getModelFactory()->restoreCheckpoint();
if (getCheckpoint()->getBool("finishedModelInit")) {
// model optimization already done: ignore this step
if (!candidateTrees.empty()) {
readTreeString(getBestTrees()[0]);
}
setCurScore(computeLikelihood());
initTree = getTreeString();
cout << "CHECKPOINT: Model parameters restored, LogL: " << getCurScore() << endl;
} else {
initTree = optimizeModelParameters(true, initEpsilon);
if (isMixlen()) {
initTree = ((ModelFactoryMixlen*)getModelFactory())->sortClassesByTreeLength();
}
saveCheckpoint();
getModelFactory()->saveCheckpoint();
getCheckpoint()->putBool("finishedModelInit", true);
getCheckpoint()->dump();
}
return initTree;
}

void IQTree::printBestScores() {
vector<double> bestScores = candidateTrees.getBestScores(params->popSize);
for (vector<double>::iterator it = bestScores.begin(); it != bestScores.end(); it++)
Expand Down Expand Up @@ -2964,6 +2989,7 @@ pair<int, int> IQTree::optimizeNNI(bool speedNNI) {
tabuSplits = initTabuSplits;
}

initProgress(MAXSTEPS, "Optimizing NNI", "done", "step");
double originalScore = curScore;
for (numSteps = 1; numSteps <= MAXSTEPS; numSteps++) {

Expand All @@ -2972,7 +2998,7 @@ pair<int, int> IQTree::optimizeNNI(bool speedNNI) {
if (save_all_trees == 2) {
saveCurrentTree(curScore); // BQM: for new bootstrap
}
if (verbose_mode >= VB_DEBUG) {
if (verbose_mode >= VB_DEBUG && !progress_display::getProgressDisplay()) {
cout << "Doing NNI round " << numSteps << endl;
if (isSuperTree()) {
((PhyloSuperTree*) this)->printMapInfo();
Expand Down Expand Up @@ -3079,8 +3105,9 @@ pair<int, int> IQTree::optimizeNNI(bool speedNNI) {
}

if(curScore < oldScore - params->loglh_epsilon){
hideProgress();
cout << "$$$$$$$$: " << curScore << "\t" << oldScore << "\t" << curScore - oldScore << endl;

showProgress();
}

if (curScore - oldScore < params->loglh_epsilon)
Expand All @@ -3097,7 +3124,9 @@ pair<int, int> IQTree::optimizeNNI(bool speedNNI) {
if (Params::getInstance().writeDistImdTrees) {
intermediateTrees.update(getTreeString(), curScore);
}
trackProgress(1);
}
doneProgress();

if (totalNNIApplied == 0 && verbose_mode >= VB_MED) {
cout << "NOTE: Input tree is already NNI-optimal" << endl;
Expand Down Expand Up @@ -4187,6 +4216,23 @@ void PhyloTree::warnNumThreads() {
outWarning("Number of threads seems too high for short alignments. Use -T AUTO to determine best number of threads.");
}

int PhyloTree::ensureNumberOfThreadsIsSet(Params *params) {
#ifdef _OPENMP
if (num_threads <= 0 ) {
int bestThreads = testNumThreads();
omp_set_num_threads(bestThreads);
if (params!=nullptr) {
params->num_threads = bestThreads;
}
} else {
warnNumThreads();
}
return num_threads;
#else
return 1;
#endif
}

int PhyloTree::testNumThreads() {
#ifndef _OPENMP
return 1;
Expand All @@ -4205,7 +4251,9 @@ int PhyloTree::testNumThreads() {
trees.push_back(getTreeString());
setLikelihoodKernel(sse);

for (int proc = 1; proc <= max_procs; proc++) {
initProgress(max_procs, "Determining AUTO threadcount", "tried", "threadcount");
for (int proc = 1; proc <= max_procs; ++proc) {
trackProgress(1.0);
omp_set_num_threads(proc);
setNumThreads(proc);
initializeAllPartialLh();
Expand All @@ -4221,7 +4269,9 @@ int PhyloTree::testNumThreads() {
// too fast, increase number of iterations
if (runTime*10 < min_time && proc == 1 && tree == 0) {
int new_num_iter = 10;
hideProgress();
cout << "Increase to " << new_num_iter << " rounds for branch lengths" << endl;
showProgress();
logl = optimizeAllBranches(new_num_iter - num_iter);
num_iter = new_num_iter;
runTime = getRealTime() - beginTime;
Expand All @@ -4239,27 +4289,33 @@ int PhyloTree::testNumThreads() {
curScore = saved_curScore;
}

if (proc == 1)
if (proc == 1) {
hideProgress();
cout << trees.size() << " trees examined" << endl;
showProgress();
}

deleteAllPartialLh();

runTimes.push_back(runTime);
double speedup = runTimes[0] / runTime;

hideProgress();
cout << "Threads: " << proc << " / Time: " << runTime << " sec / Speedup: " << speedup
<< " / Efficiency: " << (int)round(speedup*100/proc) << "% / LogL: " << (int)logl << endl;
showProgress();

// break if too bad efficiency ( < 50%) or worse than than 10% of the best run time
if (speedup*2 <= proc || (runTime > runTimes[bestProc]*1.1 && proc>1))
if (speedup*2 <= proc || (runTime > runTimes[bestProc]*1.1 && proc>1)) {
break;
}

// update best threads if sufficient
if (runTime <= runTimes[bestProc]*0.95)
if (runTime <= runTimes[bestProc]*0.95) {
bestProc = proc-1;

}
}

doneProgress();
readTreeString(trees[0]);

cout << "BEST NUMBER OF THREADS: " << bestProc+1 << endl << endl;
Expand Down
9 changes: 9 additions & 0 deletions tree/iqtree.h
Original file line number Diff line number Diff line change
Expand Up @@ -731,6 +731,15 @@ class IQTree : public PhyloTree {
*/
string optimizeModelParameters(bool printInfo = false, double epsilon = -1);

/**
* @brief: either optimize model parameters on the current tree
* or restore them from a checkpoint (this function exists because the
* same things need to be done in two different places, in runTreeReconstruction)
* @param initEpsilon likelihood epsilon for optimization
*/

string ensureModelParametersAreSet(double initEpsilon);

/**
* variable storing the current best tree topology
*/
Expand Down
Loading

0 comments on commit d84cc2e

Please sign in to comment.