Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Delta (B.1.167.2) a recombinant? #258

Open
hyanwong opened this issue Dec 10, 2024 · 19 comments
Open

Is Delta (B.1.167.2) a recombinant? #258

hyanwong opened this issue Dec 10, 2024 · 19 comments

Comments

@hyanwong
Copy link
Collaborator

hyanwong commented Dec 10, 2024

In most of our sc2ts ARGs, for example the one labelled "DELETEME_testzarr_v4-2021-11-26.ts.il.tsz", Delta (B.1.167.2 / AY.*) is a product of recombination between B.1.617.1 and another, deep branching lineage, e.g. B.1.384 (a US-only variant). Here's a visual:

Screenshot 2024-12-10 at 08 37 31 Screenshot 2024-12-10 at 08 42 41

If we are claiming that Delta is a recombinant, we should do some due diligence, and also make sure we are seeding the Deltas with a sensible combination of B.1.167, B.1.167.1, B.1.167.2, and (perhaps, if we can add one, B.1.167.3): see #226 .

@szhan
Copy link
Collaborator

szhan commented Dec 12, 2024

I wonder if we would need to present support additional evidence like Jackson et al. with the Alpha recombinants. They checked that the proposed parents of a candidate recombinant were co-circulating in the UK around the time the recombinant was sampled.

@szhan
Copy link
Collaborator

szhan commented Dec 12, 2024

That may not even be possible for Delta, given poor early sampling.

@jeromekelleher
Copy link
Owner

I don't think that'll be possible here yeah, there's just not enough samples.

@szhan
Copy link
Collaborator

szhan commented Dec 12, 2024

Hmm, that maybe why no one has really claimed that Delta is a recombinant, knowing that sampling was simply not enough (it makes me uneasy as well). Likely the same reason that there have only been hypotheses and speculation about the origin of Omicron.

@szhan
Copy link
Collaborator

szhan commented Dec 12, 2024

The Alpha recombinants may be a special case, because the COG-UK did intense enough sampling in the UK to afford the analysis of Jackson et al.

@jeromekelleher
Copy link
Owner

I think it's fine to say that the most parsimonious explanation of our data, given the model and samples we have are that 617.2 was a recombinant. We can lay out the alternative hypothesis (the match we get with no recombination) and just say that we chose the recombination explanation because we would have had to put a special case in for it to not be a recombinant in the ARG.

The key thing here is that we choose our seed samples well, so let's focus on that.

@hyanwong
Copy link
Collaborator Author

hyanwong commented Dec 17, 2024

Here is the latest ARG from Jerome (copied from slack). Note that I've run simplify(ts, filter_nodes=False) before plotting, to remove the (unary) recombination nodes, because we don't know whether to put the mutations above or below the recombination node.

Screenshot 2024-12-17 at 12 34 43

As I pointed out on slack, I think I can figure out a way of making an equally parsimonious arrangement by moving the breakpoint, which is at 25469 in the plot above. If we move the breakpoint to between 22022 and 22917, then the 4 lowest mutations on the LH branch above the recombination move to the RH branch, and then the two red mutations (T22917G and C23604G) get pushed up and no longer are recurrent: instead I think we have to revert G23012C and A24775T on the RH branch above the recombinant. I checked and this two mutations are actually required in the sample descending from the two other lineages under 118401, but perhaps a reversion push would see to that. Given the importance of Delta, I think a little bit of digging is probably warranted.

@jeromekelleher
Copy link
Owner

Thanks @hyanwong, sounds good.

I'm running the additional HMM matches currently and will update with some results later on (and a notebook PR).

@jeromekelleher
Copy link
Owner

jeromekelleher commented Dec 17, 2024

[Note: edited this post to delete the content as there were some mistakes. See below for updated details]

@szhan
Copy link
Collaborator

szhan commented Dec 17, 2024

'C5184T' is found in both the solutions.

@szhan
Copy link
Collaborator

szhan commented Dec 17, 2024

I'm tempted to go with the non-recombinant solution to keep the early part of the history more tree-like. Not that I think it is more biologically plausible, but because it is just hard to tell considering that early sampling for Delta is not that great (so err on the safe side?).

@jeromekelleher
Copy link
Owner

Whoops, scratch the bit above about the forward and reverse paths being consistent. It's a more complex story, I'll update later.

@hyanwong
Copy link
Collaborator Author

I'm tempted to go with the non-recombinant solution to keep the early part of the history more tree-like. Not that I think it is more biologically plausible, but because it is just hard to tell considering that early sampling for Delta is not that great (so err on the safe side?).

I disagree, actually. I think the left hand side of Delta is very unlike the highly mutated node 118401 on the RHS, and I don't think better delta sampling would help that.

@jeromekelleher
Copy link
Owner

Let's wait till we look at the reverse match a bit more closely, I made a mistake above, and it is different to the forward match.

@hyanwong
Copy link
Collaborator Author

hyanwong commented Dec 17, 2024

I've followed through the logic and I'm pretty sure I can hand-craft the delta origin above to make it substantially more parsimonious with the same topology. There is still only one recombinant but I only require 1 recurrent mutation (and no reversions). It would be good to talk this through with someone to check I have my logic right. If so, it's an interesting test case for how you might be able to improve the algorithm, as I think it's an artefact of the way that we can remove multiple mutations after getting the HMM cost.

@jeromekelleher
Copy link
Owner

I've gone through the details a fair bit in #273 notebook and it's really not obvious to me what the right answer is. I think one thing to point out here is that we are either stating that 617.2 is a recombinant which pulls in some 617 mutations or it's entirely independent of 617 and 617.1. The no-recombination solution entirely bypasses 617. I thought this sounded a bit unlikely, but it seems that we only need to have a handful of recurrent mutations in order to do this, and that they happen to be at sites quite prone to recurring anyway.

Here's some details:

forward solution

left_parent = 5299 (B.1.384?) right_parent=118401 (B.1.617?), breakpoint=25469 (interval = 25277-25469)

24 mutations

reverse solution

left_parent = 11294 (B.1.1) , right_parent=119685 (B.1.617?), breakpoint=22023 (note: not coinciding with interval above. Haven't computed interval for reverse match)

24 mutations.

no mutation solution

parent = 2910 (B.1)

29 mutations

Mutation overlaps

All of these sets of mutations differ quite a bit. There are 19 mutations shared by all three solutions.

The forward and reverse recomb solutions differ by 4 mutations. In fwd but not reverse:

 {'C23604G', 'G17721T', 'T22917G', 'T26767C'}

in reverse but not forward:

{'C23012G', 'C5184T', 'G26767C', 'T24775A'}

There's then 5 mutations that are in the no recombination and not in either of the recomb solutions:

{'C25469T', 'G28881T', 'G29402T', 'G29742T', 'T27638C'}

Of these, three are in the characteristic mutations for 617 defined in cov-lineages/pango-designation#38 and two of those come from sites that have well above the average number of mutations (25469 and 28881).

So: hmm. 🤔

@jeromekelleher
Copy link
Owner

As another data point, if we match the 617.2 sample against an ARG that doesn't contain any 617 or 617.1 sequences, we also get the "no mutation solution" above with num_mismatches=4.

Given how much uncertainty there is about Delta's origins, I think the simplest thing is to just accept the current recombinant solution, and write up a section in the paper discussing the fact that this is one of a bunch of different potential solutions which we can't really distinguish without better data.

@hyanwong
Copy link
Collaborator Author

Given how much uncertainty there is about Delta's origins, I think the simplest thing is to just accept the current recombinant solution, and write up a section in the paper discussing the fact that this is one of a bunch of different potential solutions which we can't really distinguish without better data.

I roughly agree with this. However, in my comment above (if I'm right) I think I could construct a recombinant solution that reduces the total number of mutations required by 2. The HMM isn't going to spot this, however. It's only found by post-hoc processing.

@jeromekelleher
Copy link
Owner

Sure - this is potentially worth mentioning in the paper. I think we just have to go with "this was a reasonable best effort to put some verified sequences into the ARG at the right times, and this is what we got under the parameters we're using. A detailed analysis of the origins of Delta using the tools we have provided is an important avenue for future work."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants