-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeds for BA.1 #264
Comments
Using the notebook in #275 we picked the first ten sequences:
It seems to be working reasonably well, and we only actually need the first one (as of the 26th - I'll check on the 27th sequences later). We should follow up with a detailed analysis on the main ARG to make sure that we've got the correct mutations, which we can report on in the paper. |
Should we use African truth set for this also (like #265), or do we expect COG-UK to be as good a place as any to find early BA.1s? The trouble with just merging with the massive set from pango-lineages is that it doesn't really rule out time travellers. |
I really hope that we can find better seeds in the new African samples. I thought the point of sequencing the samples is supposed to better understand the early evolution of Omicron. If we do look among the new samples, then we can't intersect with the pango-designation sequences. |
I'm restarting from 2021-11-09 to use these truth sequences as the Omicron seeds. I using SRR17041373 (2021-11-09) as the first (SRR18533633 comes up as earlier, but that doesn't seem to be present in current Viridian alignments) |
SRR18533633 got filtered out because its number of hets is 9. If I recall right, the threshold used in the Viridian paper is 3. |
This is tricky... The first few samples from the African dataset give pretty different results (last number is num mutations)
whereas the first few from the COG-UK set are:
Seeding with SRR17041373 above doesn't get things started, as although the COG-UK copy from it, they do so at a high HMM cost As we're pretty sure of the COG-UK ones here and the African samples are only a few days earlier, I'm inclined to just use it. The point of the exercise here isn't to pinpoint the omicron outbreaks, but to make sure we capture them reasonably cleanly so that we can accurately identify recombinants. |
OK, I'm going to try seeding with SRR17041376 and ERR7443564 as they are both pointing at the same node, and it gives us a chance to find extra BA.1 samples if they exist. As the timescale between BA.1 and BA.2 is very tight, it may be important to get more BA.1 samples in so that we can judge the likelihood of recombination. |
Linking to cov-lineages/pango-designation#361 here for useful context in splitting BA.1 and BA.2 (@szhan - this is the sort of thing I meant by #278 - let's link in with pango designation issues in these threads so we can find the background info easily) |
From what I can see this has worked really well and BA.1 looks clean. After SRR17041376 goes in like this:
10 days later we get
So there's no actual need to seed in with ERR7443564 here. Looking at some BA.1s later on the path looks very clean, with no reversions. I think this is about as good as can be done, and we can do some further analysis later to justify the choice of SRR17041376. |
A couple of simple ways to get some candidate seeds for BA.1.
pango-designation
and the samples sequenced by the COG-UK. Then, get the samples with the earliest collection dates (before 2021-12-01).The text was updated successfully, but these errors were encountered: