Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major code refactoring and various cosmetic changes #11

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

StefanFlaumberg
Copy link
Contributor

@StefanFlaumberg StefanFlaumberg commented Sep 7, 2024

This pull request depends on the noahares/GeneRaxCore#1 PR

Together they introduce the following changes:

  • Make code more logically organized, more unified and less complicated wherever possible
  • Fix the gradient search parameter setting in the thorough rate optimization mode
  • Fix the alpha parameter optimization in the rate mixture mode (speciation rate categories)
  • Fix internal species branch label update on the species tree topology changes
  • Introduce checkpoints for the filtered families, the mixture alpha and the highways (and they do work!)
  • Restrict using checkpoint if the run options are changed
  • Restrict concomitant usage of incompatible options and superfluous MPI rank usage (fixes [BUG] Stalling while using MPI #10 )
  • Update the help message
  • Add a newline char to the end of output files wherever it's missing
  • Give better control over the optimization verbosity
  • Make logs more pretty

@StefanFlaumberg
Copy link
Contributor Author

Hi Benoit,

I have restricted some options not to be used together, but I am not completely sure regarding the following 2 points:

  1. The RELDATED transfer constraint:
    Tree topology search should clearly mix up the species relative ranks, so the HYBRID search seems to be logically incompatible with the constraint. One wouldn't either start a REROOT search from a dated species tree, but the root search algorithm itself implements some kind of dating search, so one can use the REROOT search to get a properly rooted and dated species tree.
    Thus, I decided under the RELDATED contraint to restrict usage of the HYBRID search, but to allow for usage of the REROOT search if the --infer-speciation-order option is additionally invoked.

  2. The PER-SPECIES rates:
    Your wiki states that "different rates for different species are not compatible with species tree inference". However, the HYBRID tree search runs seemingly ok both with PER-SPECIES and ORIGINATION-PER-SPECIES rates (why shouldn't it?).
    Thus, I decided to restrict only concomitant usage of tree search (both HYBRID and REROOT) and model parametrization with a custom file (for an obvious reason: the file uses ancestral clade names). All the other combinations are allowed.

Could you, please, tell if this looks sensible?

Best,
Stefan

@noahares
Copy link
Collaborator

noahares commented Oct 3, 2024

Hi Stefan,

I will be continuing Benoit's work with AleRax. Thanks for opening this PR! I will look into your changes and questions.

Best,
Noah

@StefanFlaumberg
Copy link
Contributor Author

Hi Noah,

It's good to know that AleRax will be actively maintained again!

Since I didn't expect any updates coming, I've been doing major code refactoring for a couple of months (fixing a number of small bugs, code structure optimization, making output more fancy and stuff mentioned in this PR description) and was going to publish the changes within this PR in the coming weeks.

Now I'll have to update my changes by including your recent commits and hope to publish the final version of the PR by the end of the next week.

Best,
Stefan

@StefanFlaumberg
Copy link
Contributor Author

Dear @noahares and @BenoitMorel ,

The PR is ready for review.

I started the work in August. The main goal was refactoring the code to make it more logically organized, succinct and readable. Some functions changed their names, some were merged, splitted or even removed, but the algorithmic side is left 99% intact. The 1% goes to fixing small bugs like making the optimized rate gamma categories to be included to the actual evaluators, proper speciation date backuping, proper branch label update on changing topology etc. The output became more user-friendly, as the help menu did. Incompatible options now cannot be run together leading to explicit error messages. Checkpointing became more error-proof by remembering the filtered families, the rate mixture parameter and highways and by prohibiting running with changed options.

The code has been eye-checked countless times, and test runs in various modes have not uncovered any errors or unexpected LL changes compared to the current version performance. I have also run tests using corrected reconciliation models (not part of this PR, see below) to ensure everything works fine even where the current version fails.
It is huge, but should be fine. I will be rather surprised if any undetected error caused by my amends eventually shows up.

The only obvious problem we have is the style discrepancy. Most of the changes were done before Noah's publishing of the restyled code. Since my amendments are numerous and quite diffuse, I could not reproduce them on the current restyled code version from scratch. Thus, I only managed to include your HGT highway search improvements to my code, yet left all the code mostly in the original style of Benoit (barring extra whitespaces and newlines).
Personally, I find this original style more easy-to-read. But I totally acknowledge the advantages of adhering to the existing coding standards, so if you consider it critical, I could rewrite my version in the new style in some time. Yet, I would rather postpone the rewriting since I also have some changes to propose fixing quite serious algorithmic flaws in the reconciliation model files and these should be of higher priority. I'll publish the fixed models in a separate PR a bit later.

If you find something in the code that doesn't seem right, please contact me back :)

With best hopes for understanding,
Stefan

@StefanFlaumberg StefanFlaumberg changed the title Various cosmetic changes Major code refactoring and various cosmetic changes Oct 28, 2024
@noahares
Copy link
Collaborator

Hi @StefanFlaumberg,

Thank you very much for the work you put into this! I will have a look into the changes as soon as possible. I am currently working on some improvements and feature extensions myself which are semi time-critical so I hope to merge your changes before I work on mine (or deal with merging both later). The "restyling" is exclusively from the clangd-format file so unifying the styles should not be an issue. If you have suggestions to changes to the style file, let me know.

Best
Noah

@StefanFlaumberg
Copy link
Contributor Author

Hi @noahares ,
Thank you for the reply!

I didn't know about automatically formatting the code. Well, great that no one will have to reformat the changes manually. I'm fine with that)

Considering your time constraints, just to synchronize:
Sorry for the inconvenience. I think I'll be ready to add another PR fixing the reconciliation models at the beginning of next week. While the current PR and the one covering the models substantially alter the code, they also fix the most prominent algorithmic flaws of the tool. There are also some other minor problems I've found, but they cannot significantly affect performance, so fixing them can wait. I believe that implementing and testing new features might be more efficient with the old flaws covered if possible (for example, the current model in certain cases estimates LL > 0, which I'm fixing in the coming PR).

Best wishes,
Stefan

@StefanFlaumberg
Copy link
Contributor Author

StefanFlaumberg commented Nov 19, 2024

Hi @noahares ,

Hope your work on the program is going well. Since it's already been 3 weeks, I just wanted to kindly remind you of this PR and to inform that I'm also ready with another PR fixing the reconciliation models. Could you please roughly outline when you think you'll be able to check it?
As I consider using AleRax in our group's projects, I find it important that the program have as few bugs as possible. And other users would probably desire the same from the program main version.

Thank you for your time!

Best,
Stefan

@noahares
Copy link
Collaborator

Hi @StefanFlaumberg,

My current workload and travels did not yet allow for sitting down to fully inspect the changes. I hope you understand that even though most changes are simple, I want to look through them closely for a codebase I am familiarizing myself with at the moment and plan to maintain in the long run.
I hope to look at it in the upcoming weeks.

Best
Noah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Stalling while using MPI
2 participants