-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeds for BA.2 #265
Comments
Using COG-UK sequences joined with the pango lineage data doesn't work because the first sample is ERR7965207 from 2022-01-03 , which happens after a retro group of BA.2s gets added:
While it looks OK generally, it's annoying that we've got the BA.2.10 creeping in here. It may not be worth worrying about though and we should just let the algorithm do it's thing with minimal intervention. |
Do we have the experiment accessions for the resequenced early omicron sequences discussed in the Viridian paper @szhan? Just looking for "South Africa" as the region is too broad I think. Surely there's some systematic way we can track down the sequences that were actually generated?? |
Yes, Supplementary Table S7 in this Excel file https://figshare.com/articles/dataset/Supplementary_Tables_S2-9/27195315/2?file=49784541. |
Ah, that's much better. OK, I'll just pick the earliest BA.2 from that. |
So, let's find a suitable seed among the new African samples collected before 2021-12-30? The dates for these samples should be trustworthy. |
I can automate this and update the notebook accordingly? |
Yes please. Can you review #275 first so I can merge? |
Okay, now for the Omicron sublineages, we are looking among the new African samples (not only those from South Africa). |
Searching manually among the BA.2s that show up in the max-daily-samples=1000 version I'm working with the first from the African truth-set is SRR17461930. Would be good to cross-check that this shows up in the notebook. |
SRR17461930 isn't in the top 10, but 20th. |
I'm restarting from the start of Omicron now using SRR17089886, 2021-11-17 as the BA.2 seed That came out as the first BA.2 in my quick hacks - do you agree? |
Yes, SRR17089886 is coming up as the earliest sample, after filtering out those with more than 3 hets. |
A tricky issue here is that we don't want to add BA.2s in too early if we do think it might be a recombinant, because we have to wait for BA.1 to get properly established in the ARG (which we think may be one of the parents). If we put BA.2 sequences in too soon, then it'll be more parsimonious to just copy from one deep parent (as there's tons of mutations anyway). This is all very difficult - I don't think we can say very much about the possible origins of these large saltational changes with these tools. The combination of the huge numbers of mutations and poor representation of the parents in the reference panel make this a bad match for our model. We can pick out recombinations very well when the parent lineages are well represented in the ARG and when we're balancing the probability of a handful of mutations vs a recombination. Massive departures from the model like Delta and Omicron are just not a good fit. |
Just for later reference, using this seed (SRR17461792) results (after a few weeks) in a "push" node that descends from the original recombinant with 5 reversions. These are all reversions of mutations that happen on the root of the BA.1 lineage. I don't really know what this means, but it doesn't look to me like something that would result from choosing the wrong seed and is probably more to do with having insufficient sampling (much like Delta in #226). It will be something we need to look at quite closely, as the HMM is quite clear about BA.2 being a recombinant, given the reference panel we have. |
A couple of simple ways to get some candidate seeds for BA.2.
pango-designation
and the samples sequenced by the COG-UK. Then, get the samples with the earliest collection dates (before 2022-01-08).The text was updated successfully, but these errors were encountered: