Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add discrepancy function and related things #313

Open
petrelharp opened this issue Aug 31, 2023 · 5 comments
Open

add discrepancy function and related things #313

petrelharp opened this issue Aug 31, 2023 · 5 comments

Comments

@petrelharp
Copy link

We would like to add to tsdate.evaluation:

  1. break ties in match_node_ages by closest-in-time (and rename to match_nodes, which requires changing the code wherever the function is used)
  2. add a discrepancy method that returns for the discrepancy between tsa and tsb, the tuple of
    a. the total shared span divided by total node span in tsa (which we need to compute); so this is proprotion of the span in tsa that is accurately represented in tsb; and
    b. the root-mean-squared discrepancy in time, with the average weighted by span in tsa.
@petrelharp
Copy link
Author

@hfr1z3

@nspope
Copy link
Contributor

nspope commented Aug 31, 2023

match_node_ages isn't used anywhere else in this repo (only in my own evaluation/paper repo) and it's not documented-- so should be perfectly fine to just change the name in tsdate.evaluation

@hfr1tz3
Copy link

hfr1tz3 commented Sep 14, 2023

@petrelharp , I wrote up a discrepancy function with the desired outputs.
However, I need to double check if it correctly outputs the correct RMSE that @nspope is looking for.

I did run the following test case:
For simulated tree sequence ts, let sts = ts.simplify() and ets = sts.extend_edges().
Then tree_discrepancy(ets, ts) has discrepancy value of 1 and rmse of 0.
This makes sense as all of ets should be in ts, and there should therefore be zero error so I think it should be correct, but please double check if I missed something.

@petrelharp
Copy link
Author

Okay, great. Let's see - this is the one on your fork, right? You may as well start a PR with these changes, that'll be easier to keep track of and discuss.

@petrelharp
Copy link
Author

Hm, let's see. I agree with the check that you have. Here's some other possible checks:

  1. Perturb the times of (some of) the nodes, so you know what the correct RMSE should be. (You could get a set of guaranteed legal perturbations by first iterating over the trees and finding an upper bound for each node by the lowest time of an ancestor across the trees; then only perturb nodes up.)
  2. Make a very simple example where we know the right answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants