Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local tree around the emergence of Delta #313

Closed
szhan opened this issue Sep 27, 2024 · 6 comments
Closed

Local tree around the emergence of Delta #313

szhan opened this issue Sep 27, 2024 · 6 comments

Comments

@szhan
Copy link
Contributor

szhan commented Sep 27, 2024

While looking at the earliest HMM group of samples attached in long_arg_v7_clustloc-mrm_2-rw_10-mgs_10-2021-06-30.ts.tsz (md5sum: 6cde6e2c00624a505aa00063973368f2), I noticed that the samples have a suspiciously high number of ambiguous characters (specifically K).

{'-': 13, 'A': 8892, 'C': 5458, 'G': 5845, 'K': 1, 'N': 121, 'T': 9573}
{'-': 13, 'A': 8892, 'C': 5458, 'G': 5845, 'K': 1, 'N': 121, 'T': 9573}
{'-': 13, 'A': 8891, 'C': 5458, 'G': 5845, 'K': 1, 'N': 121, 'R': 1, 'T': 9573}
{'-': 1, 'A': 8894, 'C': 5468, 'G': 5842, 'K': 1, 'N': 121, 'T': 9576}
{'-': 13, 'A': 8837, 'C': 5442, 'G': 5815, 'K': 1, 'N': 268, 'T': 9527}
{'A': 8885, 'C': 5462, 'G': 5844, 'N': 130, 'T': 9582}
{'-': 1, 'A': 8793, 'C': 5412, 'G': 5796, 'N': 413, 'T': 9488}
{'-': 4, 'A': 8889, 'C': 5465, 'G': 5848, 'K': 1, 'N': 121, 'T': 9575}
{'A': 8231, 'C': 5081, 'G': 5439, 'N': 2342, 'T': 8810}
{'-': 4, 'A': 8791, 'C': 5412, 'G': 5795, 'N': 413, 'T': 9487, 'Y': 1}
{'A': 8893, 'C': 5471, 'G': 5849, 'N': 104, 'T': 9586}
{'-': 1, 'A': 8793, 'C': 5412, 'G': 5795, 'K': 1, 'N': 413, 'T': 9488}

Also, these samples have a mix of Viridian Pango labels:
B.1.617.2, n = 3
B.1.617, n = 2
B.1.617.1, n = 7

These samples may be making it harder to build a good local tree around the start of the Delta wave. By being strict on the number of ambiguous character (Viridian_cons_het == 0, ignoring '.'), we may be able to do better here.

@szhan
Copy link
Contributor Author

szhan commented Sep 27, 2024

Here is sampling frequency when looking at only the samples with Viridian_cons_het == 0 and Viridian_cons_het != .. There are 2,040,650 samples.

viridian_md_no_hets

@szhan
Copy link
Contributor Author

szhan commented Sep 27, 2024

Since the sampling is quite thin before March 1st, we can probably relax the filter on hets for that part of the ARG. We can impose the het = 0 filter on the samples onwards. The early Delta and closely related sample crop up in March/April.

@szhan
Copy link
Contributor Author

szhan commented Sep 27, 2024

Viridian_cons_het == 0 is too strict. Lots of samples have at least 1 het.

@jeromekelleher
Copy link
Owner

What do you suggest so? I might run this over the weekend.

@szhan
Copy link
Contributor Author

szhan commented Oct 5, 2024

Just noting that we don't have any good samples till March 2021 for Delta B.1.617.2.

@jeromekelleher
Copy link
Owner

Linking this to jeromekelleher/sc2ts-paper#226 for xref and closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants