Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and cleanup tree finetuning code #1250

Conversation

ivan-aksamentov
Copy link
Member

@ivan-aksamentov ivan-aksamentov commented Sep 11, 2023

Followup of #1249

Inspired by refactoring in 15757ec (#1249) I decided to continue, because the code in the finetune_nearest_node() can be clearly divided into independent functions.

Change of terminology, to clarify meaning of varables:

  • current_best_node -> best_node (this is the loop's external state)
  • best_node -> candidate_node
  • best_split_result -> candidate_split

I tried to make commits to contain only 1 minimal change which compiles and runs. Plus there is a few pink elephants I encountered - reported in the PR comments below.

@vercel
Copy link

vercel bot commented Sep 11, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
nextclade ✅ Ready (Inspect) Visit Preview Sep 12, 2023 6:22am

@ivan-aksamentov
Copy link
Member Author

ivan-aksamentov commented Sep 11, 2023

"Is leaf and not root" should always be the case for trees with more than one node, probably redundant.

Old version:

} else if current_best_node.is_leaf()
&& !current_best_node.is_root()
&& current_best_node.payload().tmp.private_mutations.nuc_muts.is_empty()

I kept it as is in the new version, but I think we can remove the check for root, because parent_of() returns None if there is no parent node:

} else if best_node.is_leaf() && !best_node.is_root() && best_node.payload().tmp.private_mutations.nuc_muts.is_empty()
{
graph.parent_of(candidate_node)

(Seems rustfmt is not doing great here. Making this line shorted will also make code formatting better.)

@ivan-aksamentov
Copy link
Member Author

ivan-aksamentov commented Sep 11, 2023

Check for potential bug here. For example, it might be that the original code meant to do something else.

This is a roundrip: node -> node key -> node, equivalent to no-op:

} else {
// The best node is child
current_best_node = graph.get_node(best_node.key())?;

In the new version I replaced it with just the variable itself:

@ivan-aksamentov
Copy link
Member Author

ivan-aksamentov commented Sep 11, 2023

The control flow here is very convoluted:

  • there are repeated conditions (candidate_node == best_node)
  • there are 5 points of exit (return/break)

Can this be simplified?
If not, could we add some comments for each condition and each point of exit?

Old version:

if n_shared_muts > 0 {
if best_node.key() == current_best_node.key() && best_split_result.left.nuc_muts.is_empty() {
// All mutations from the parent to the node are shared with private mutations. Move up to the parent.
// FIXME: what if there's no parent?
current_best_node = graph
.parent_of_by_key(best_node.key())
.ok_or_else(|| make_internal_report!("Parent node is expected, but not found"))?;
} else if best_node.key() == current_best_node.key() {
// The best node is the current node. Break.
break;
} else {
// The best node is child
current_best_node = graph.get_node(best_node.key())?;
}
} else if current_best_node.is_leaf()
&& !current_best_node.is_root()
&& current_best_node.payload().tmp.private_mutations.nuc_muts.is_empty()
{
current_best_node = graph
.parent_of_by_key(best_node.key())
.ok_or_else(|| make_internal_report!("Parent node is expected, but not found"))?;
} else {
break;
}

In the new version, I kept it mostly as is, but it looks simpler (due to removal of assignment to mutable vars):

fn find_better_node_maybe<'g>(
graph: &'g AuspiceGraph,
best_node: &'g Node<AuspiceGraphNodePayload>,
candidate_node: &'g Node<AuspiceGraphNodePayload>,
candidate_split: &SplitMutsResult,
n_shared_muts: usize,
) -> Option<&'g Node<AuspiceGraphNodePayload>> {
// if shared mutations are found, the current_best_node is updated
if n_shared_muts > 0 {
if candidate_node == best_node && candidate_split.left.nuc_muts.is_empty() {
// Caveat: all mutations from the parent to the node are shared with private mutations. Move up to the parent.
graph.parent_of(candidate_node)
} else if candidate_node == best_node {
// The best node is the current node. Break.
None
} else {
// The best node is child
Some(candidate_node)
}
} else if best_node.is_leaf() && !best_node.is_root() && best_node.payload().tmp.private_mutations.nuc_muts.is_empty()
{
graph.parent_of(candidate_node)
} else {
None
}
}

@ivan-aksamentov
Copy link
Member Author

ivan-aksamentov commented Sep 11, 2023

The refactoring attempt in this PR involved extracting new scopes (functions) which breaks continuity of the original large scope, which happens to have mutable state. Because the original code relied on modifying the state a lot, I might have introduced bugs where the variables are not updated properly anymore. This needs to be double-checked.

by checking first whether the candidate_node equals the current best
node, the set of conditionals gets simplified and multiple previous
conditions are combined into one. There are now three possible results:

 - move to the parent node. Happens whenever best_node==candidate_node
 and all mutations that lead to the candidate_node are also found in
 the private mutations of best_node
 - move to a child (candidate_node!=best_node). This only happens when
 there is at least one mutation shared.
 - stay: in this case, the final placing is either a direct child of
 candidate_node, or splits the branch leading to it.
@ivan-aksamentov ivan-aksamentov marked this pull request as ready for review September 12, 2023 06:04
@ivan-aksamentov ivan-aksamentov merged commit 4e5feb7 into fix/handle-spurious-aamuts-in-tree Sep 12, 2023
@ivan-aksamentov ivan-aksamentov deleted the refactor/tree-builder-finetune branch September 12, 2023 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

2 participants