-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Origins of Alpha #225
Comments
That paper also mentions the later MILK-B154B6, GISAID ID: EPI_ISL_2735517 (which I think corresponds to ERR4869224), concluding that it could be a recombinant or result of lab contamination. That also isn't in our dataset (and was presumably filtered out). This would be another interesting sequence to match in, after the fact. |
It's from 2020-10-23, and I think it just was a bit too early to gather up any other Alpha samples be within the window for inclusion. Would be interesting to see what would happen from matching it back later and adding it in all right. |
Looking at the input sequence of ERR4869224, which has a decent number of Ns in these regions. I wonder how many imputed bases in at those sites (based on the best match) would be different than the ref. If all those Ns were supposed to be the same as ref., then we could have quite different placement in the trees. |
Yes, this is odd, good spot. Let's keep an eye out for this on the next (final??) iteration. I'm suspicious that the HMM current settings will lead to slightly suboptimal HMM matches for really unlikely matches, and it may look something like this. |
It would be nice to be able to show, graphically, where the two "intermediate" samples (ERR4869224 and ERR4638413) which we excluded fit onto the tree, and how they would shuffle the order of mutations on that long branch. The routine that @jeromekelleher coded at jeromekelleher/sc2ts#360 should do most of it, but maybe not the mutation pushing etc required to change the mutation order. I assume we'd either match these sequences to the top or bottom node of that long branch, and then change the mutation ordering using reversion pushes, etc. |
If you run the match and post the results we can figure out what inserting would do. |
This would be worth a line or two in the main text, and some digging for the supplementary. In the "maskreg-psv2-v1-mm_4-f500-clustloc-mrm_2-rw_10-mgs_10-2021-01-28" tree, it appears as if we can resolve the first of the Alpha mutations to matching against some samples sequenced in Cambridge (suspiciously geographically close to Kent, the origin of the alpha outbreak) in particular these samples, which are all B.1.1
This is so early in the pandemic that we can easily re-run the matching up to that point. @jeromekelleher points out that node 4725 here is likely to be caused by a reversion push, so it is a result of the tree-building part of the sc2ts algorithm. It has 3 exact matches:
@szhan dug out a paper which also finds a Cambridge sample near the root of the alpha-defining lineages: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9752794/. We exclude their sample (CAMC-946506 / ERR4638413) because it has too much missing data:
sc2ts.inference Filter ERR4638413: missing=2393 > 500
, but it could be interesting to match it in afterwards.Note that the taxonium tree below matches these 6 samples (circled) quite a long way from the root of alpha (alpha in cyan here). Shing suspects this is because of treating deletions as ancestral states.
Code to replicate
The text was updated successfully, but these errors were encountered: