-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify haploid Viterbi/FB to handle NONCOPY state in reference panel #31
Conversation
The tests cover the following:
|
Presently, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks great @szhan.
@astheeggeggs, what do you think? It would be really helpful to have a direct matrix-based implementation of the full "all haplotypes" model we can compare with, and this seems like the most natural way. Would be great to get your thoughts here.
So the intuition here is that the number of templates in the reference
panel jumps from n -> 2n -1, and you fill the ancestral reference templates
with missing values (on anything really) where there is no ancestral
material in those haplotypes, give them a 'NON_COPY' label at those sites,
and set the corresponding emission prob to 0. You can then set
recombination rate as rho_j/(2*n-1-|sum(NON_COPY_j)|) for each site j. That
logic all makes sense to me!
…On Fri, Mar 22, 2024 at 9:10 AM Jerome Kelleher ***@***.***> wrote:
***@***.**** commented on this pull request.
Nice, looks great @szhan <https://github.com/szhan>.
@astheeggeggs <https://github.com/astheeggeggs>, what do you think? It
would be really helpful to have a direct matrix-based implementation of the
full "all haplotypes" model we can compare with, and this seems like the
most natural way. Would be great to get your thoughts here.
—
Reply to this email directly, view it on GitHub
<#31 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVQA77ORBRY3VYMCOP7QA3YZPYSBAVCNFSM6AAAAABFB2M6BGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSNJUGI4TOMRQGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Is there a point to add tests for cases where we expect at least two equally likely paths? It seems quite easy to encounter cases with more than one best path when the number of reference haplotypes and number of sites grow. |
I have reworked the tests, which now cover the following queries copying from:
In the test ref. panels, all the ancestors carry at least one Instead of asserting that the expected path and the actual path are identical, the new tests assert that the log-likelihood of the expected path and that of the actual path are approximately equal (i.e. passing |
Just to add a note here that in tsinfer |
db075cf
to
ac20d0a
Compare
…n reference panel
I have two sets of tests passing now. One set ( In both sets of tests, The set of tests to keep is in
Probably I should delete Otherwise, I think the tests are ready for review @astheeggeggs, besides some more minor tweaks I'd like to make. |
Ah, by modifying Also, I think we should probably include assertions that none of the most likely paths go through a base in the reference panel that is equal to |
Looking at this again, I think it needs a rethink. We have emission probabilities that don't sum to 1, which doesn't make sense in the usual HMM framework. |
I'm going to scrap most of the code here, since it makes sense to do refactoring before adding new features. I'll probably just keep some of the handcrafted test cases and the routine that gets a ref. panel with NONCOPY states. |
Aside from rethinking about this, we should probably refactor the code a bit first. See #37 |
I've deleted this branch of code, but it is probably useful to keep the discussion here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I've not checked the tests, but the alterations in the functions absolutely make sense.
Thanks @astheeggeggs ! Please do not merge the code in this branch! Please look at the code in #41 instead. |
Closing thie PR because it's replaced by PR #41. |
If ancestral haplotypes are used as part of a reference panel for LS matching, then non-copying states in them need to be treated differently than other states. When a non-copying state is encountered, the emission probability should be 0, because it is not allowed to copy from it. We arbitrarily set the numeric value of non-copying states to be
-2
, which is distinct from missing state that is set to-1
. Here, we modify how emission probabilities are computed in the haploid version of the Viterbi algorithm. We also add examples of reference panels, queries, and expected paths for testing.