Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Origins of delta #226

Open
hyanwong opened this issue Oct 14, 2024 · 51 comments
Open

Origins of delta #226

hyanwong opened this issue Oct 14, 2024 · 51 comments

Comments

@hyanwong
Copy link
Collaborator

hyanwong commented Oct 14, 2024

In the ARG as of 13th Oct 2024, at the start of the spike there is a single clade that comprises the delta samples ("B.1.617.2" or "AY.*"), which contains all 43028 delta samples plus two B.1-assigned samples (SRR23110826 (node 17569) and SRR11810706 (node 12805), both of which seem to be good candidates for samples close to the origin of delta (orange, below).

However, the origins of delta are a little messy, with reversions (in magenta) crammed into a short branch below node 140425 (expanded below), and several separate lineages. We think this is probably because of adding different retrospective groups from different countries. We should check whether this is improved in the next ARG iteration.

Note that nodes 140425 and its sister 147202 are recombination nodes (larger open circle used)
Screenshot 2024-10-14 at 16 25 38

Screenshot 2024-10-14 at 16 22 22

For huge clades like this, it can be helpful to subsample a few delta nodes, as per below

delta = np.concatenate([v for k, v in ti.pango_lineage_samples.items() if k.startswith("AY.") or k=="B.1.617.2"])
rng = np.random.default_rng(seed=1)
ti.draw_subtree(
    tracked_samples=rng.choice(delta, 20),  # a random selection of 20 delta nodes
    #position=23064,
    time_scale="time",
    size=(1500, 1500),
    canvas_size=(1600, 1500),
    collapse_tracked=None,
    extra_tracked_samples=[12805, 17569],
    style=(
        ".plotbox {transform: translateX(40px)}"
        ".leaf > .lab {text-anchor: start; transform: rotate(90deg) translate(6px)}"
       + ",".join([f".node.n{u} > .sym" for u in np.where(ts.nodes_flags & sc2ts.NODE_IS_RECOMBINANT)[0]]) + "{r: 5px; stroke: black; fill: white}"
    ),
)
@hyanwong
Copy link
Collaborator Author

Note that SRR11810706 is from Gujarat, and we know that Delta originated in India, so this might be believable.

@hyanwong
Copy link
Collaborator Author

hyanwong commented Oct 21, 2024

The newer all-sample ARG, as of this morning ("maskdel-v1-mm_4-f_500-mrm_2-mms_5-mrec_2-rw_7-mgs_10-2021-08-28") has no major recombination problems, but a different problem with the start of Delta. In particular, it infers 2 separate Delta origins, one comprising about 2.3rds of the AY- lineages plus B.1.617, B.1.617.1, and B.1.617.2, the other comprising about 1/3rd of the remaining AYs. There are loads of parallel mutations on the stem leading to each clade. This is clearly wrong, and we should try to figure out why, and check that it doesn't happen in future ARGs.

@jeromekelleher dug into the logs and saw two retro groups being added on the same day, which could be the source of this:

2024-10-19 07:01:34 WARNING sc2ts.inference Add retro group {'B.1.617.2': 21, 'AY.38': 2, 'AY.9': 2}: samples=25 depth=3 total_muts=53 root_muts=11 muts_per_sample=2.12 recurrent_muts=1 
2024-10-19 07:01:34 WARNING sc2ts.inference Add retro group {'B.1.617.2': 9, 'AY.122': 1, 'AY.1': 1}: samples=11 depth=2 total_muts=40 root_muts=14 muts_per_sample=3.6363636363636362 recurrent_muts=0 

Alternatively, it could be some of the tweaked HMM parameters.

Here's a plot subset down to about 30 AY.4 samples (cyan), one of which is an outlier and groups under a recomination node on the far right. The others are all in the 1/3rd clade, which is independently picking up the same mutations that lead to the bulk of the delta-origin "B.1.617.2" samples (in orange, below):

image

Here's the main plot of all AY lineages, with Delta (bottom right) showing 3 independent origins (urgh):

image

@jeromekelleher
Copy link
Owner

An important thing to note here is that these two retrogroups are clearly a mix of time travelling lineages. Most retro groups consist of just one pango lineage (indicating that we're picking up the origin of that lineage). A mixture of lineages indicates potential problems. A mixture of highly distinct lineages across many months (here) indicates time travel an big trouble.

@hyanwong
Copy link
Collaborator Author

Good point about a mix of lineages. In real time we might not have the lineage information for new samples (lineages may not have been devised yet), but in that case we shouldn't have so many time travelling problems either.

@hyanwong
Copy link
Collaborator Author

hyanwong commented Nov 15, 2024

The seeding method, using strain ERR5876690, seems to give OK results: at least, it creates only one tree (no recombination), has no reversions and few recurrent mutations, and doesn't lead to multiple origins for delta.

Screenshot 2024-11-15 at 13 19 00

We don't have any In this test ARG (delta_wave_seeded_v3_hmm_cost_7-2021-06-07.ts.tsz) there is a single sample, ERR5965862 (node 156987) which comes off first, but is about 60 days later than the delta node:

Screenshot 2024-11-15 at 13 48 01

ERR5965862 is separated from the root of all other deltas, 142724, by a single mutation, A11201G, which is not a reversion or anything, so maybe that's OK? Perhaps a sample worth looking at (e.g. can we map to GISAID and get a sample submission date).

Screenshot 2024-11-15 at 13 55 24

The nodes under 142724 show recurrent mutations (G21987A (1/5) and C21846T (1/6)), which seem a bit suss to me. It could be worth looking into what's going on there, and whether different seed samples would change anything.

@jeromekelleher
Copy link
Owner

Thanks Yan, super helpful. The seed sample here was ERR5876690.

ERR5965862 happens to be one of the Delta samples that arrived soon after the seed sample was added, and it matched to it with 2 mutations [T10651C, G11201A]. G11201A is an immediate reversion, and a reversion push node was therefore created which became the ultimate "Delta node".

The single side shoot branch here is just a function of chance I think, and odd stuff like that's going to happen. The recurrent mutations aren't brilliant, but I think we can live with that.

It really is quite hard to find a good starting point with all the noise, so unless there's something badly wrong with this proposal I think we should stick with it.

@hyanwong
Copy link
Collaborator Author

Here's a useful paper talking about Delta sequences in the UK: https://www.nature.com/articles/s41586-022-05200-3

@hyanwong
Copy link
Collaborator Author

hyanwong commented Nov 28, 2024

We are trying to find a decent Delta seed, that is not a time traveller. I see an article that mentions the earliest Delta in GISAID is on 5th Oct 2020 (see https://www.sciencemediacentre.org/expert-reaction-to-cases-of-variant-b-1-617-the-indian-variant-being-investigated-in-the-uk/). @szhan: would it be possible to locate the GISAID submission that is discussed in that article? It seems to be from Maharashtra state.

This paper uses EPI_ISL_1360382, but that's from 2021, I think.

It also says "Most isolates sequenced by India originated from Maharashtra and West Bengal, but B.1.617 has been identified in several other states.", so we could potentially find other believable seeds by looking for submissions in mid-oct from those states?

@hyanwong
Copy link
Collaborator Author

hyanwong commented Nov 28, 2024

This could be useful, from https://pubmed.ncbi.nlm.nih.gov/33961693/. The preprint (https://www.biorxiv.org/content/10.1101/2021.04.23.441101v1.full.pdf) sounds like it could be helpful.
Screenshot 2024-11-28 at 11 58 41

@hyanwong
Copy link
Collaborator Author

hyanwong commented Dec 4, 2024

Another possibility is to look for plots in papers published near the time. E.g. https://weekly.chinacdc.cn/fileCCDCW/journal/article/ccdcw/2021/30/PDF/CCDCW210107.pdf shows a few interesting strains:

hCoV-19/India/MH-NCCS-87448/2021|EPI ISL 1415203.2|2021-02-16 (deep branching, Indian)
hCoV-19/India/ILSGS00308/2020|EPI ISL 1372093|2020-12-01 (Early, Indian)

@hyanwong
Copy link
Collaborator Author

hyanwong commented Dec 6, 2024

Looking at GISAID and restricting to non-low-quality from Maharashtra, we only have 6 sequences to look at, of which only the first has an actual day (rather than month) EPI_ISL_2131509, EPI_ISL_3473612, EPI_ISL_3473618, EPI_ISL_3473613, EPI_ISL_3473611, and EPI_ISL_3473614:

These could be worth checking by hand, perhaps (if we can match them against Viridian sequences)

Screenshot 2024-12-06 at 23 18 43

@szhan
Copy link
Collaborator

szhan commented Dec 9, 2024

BLASTed the ILSGS00308 GISAID sequence against the Viridian batch 1 sequences (min. sequence identity = 99.95%), and got ERR5461550 as an imperfect hit (99.97%) with sampling date of 2021-02-22. Note I'm excluding samples with reported collection date of 2020 NY Eve.

@szhan
Copy link
Collaborator

szhan commented Dec 9, 2024

For MH-NCCS-87448, I'm getting a hit with %identity = 100. It's sample SRR14388093 with collection date of 2021-02-16, which is the same as MH-NCCS-87448. There seem to be other sequences also with %identity = 100 but sampled earlier in 2020.

@jeromekelleher
Copy link
Owner

Do these dates line up with what's in the paper that's referring to them? We can quite happily change the date of these specific sequences if we have good reason.

@szhan
Copy link
Collaborator

szhan commented Dec 9, 2024

For MH-NCCS-87448, yes. SRR14388093 (in the Viridian dataset) is reportedly sampled on the same date as MH-NCCS-87448. There are other samples in the Viridian dataset with 100% identity to MH-NCCS-87448 that we could use as seeds.

@hyanwong MH-NCCS-87448 is B.1.617, not B.1.617.2, according to the China CDC Weekly paper you found. Are suggesting to maybe use a sequence identical or similar to MH-NCCS-87448 to seed (as a precursor) Delta and its related lineages?

@jeromekelleher
Copy link
Owner

I don't think it can be SRR14388093 @szhan , that's coming up as a B.1.2 from Texas for me

@jeromekelleher
Copy link
Owner

The other one looks better:


[30]:
ds.metadata["ERR5461550"]
[30]:
{'Artic_primer_version': '3',
 'Collection_date': '2021-02-22',
 'Country': 'United Kingdom',
 'Date_tree': '2021-02-22',
 'Date_tree_order': 'Date',
 'Experiment': 'ERX5244529',
 'First_created': '2021-03-12',
 'Genbank_N': 0,
 'Genbank_accession': 'OU050089.1',
 'Genbank_other_runs': '.',
 'Genbank_pangolin': 'B.1.617.1',
 'Genbank_scorpio': 'B.1.617.1-like',
 'Genbank_tree_name': 'ERR5461550.genbank.OU050089.1',
 'In_Viridian_tree': True,
 'In_intersection': True,
 'In_may_2024_preprint': True,
 'Platform': 'ILLUMINA',
 'Region': 'none',
 'Run_count': 1,
 'Sample': 'SAMEA8240357',
 'Study': 'PRJEB37886',
 'Viridian_N': 0,
 'Viridian_amplicon_scheme': 'COVID-ARTIC-V3',
 'Viridian_cons_het': 1,
 'Viridian_cons_len': 29835,
 'Viridian_pangolin': 'B.1.617.1',
 'Viridian_pangolin_1.29': 'B.1.617.1',
 'Viridian_result': 'PASS',
 'Viridian_scorpio': 'B.1.617.1-like',
 'Viridian_scorpio_1.29': 'B.1.617.1-like',
 'date': '2021-02-22',
 'strain': 'ERR5461550'}

with the match:

{
  "strain": "ERR5461550",
  "num_mismatches": 4,
  "direction": "forward",
  "match": {
    "path": [
      {
        "left": 0,
        "right": 28882,
        "parent": 6303
      },
      {
        "left": 28882,
        "right": 29904,
        "parent": 43646
      }
    ],
    "mutations": [
      {
        "site_id": 207,
        "derived_state": "T",
        "inherited_state": "G",
        "site_position": 210,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 3443,
        "derived_state": "T",
        "inherited_state": "C",
        "site_position": 3457,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 4950,
        "derived_state": "T",
        "inherited_state": "C",
        "site_position": 4965,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 8117,
        "derived_state": "T",
        "inherited_state": "G",
        "site_position": 8137,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 11173,
        "derived_state": "G",
        "inherited_state": "A",
        "site_position": 11201,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 17485,
        "derived_state": "T",
        "inherited_state": "G",
        "site_position": 17523,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 20351,
        "derived_state": "G",
        "inherited_state": "A",
        "site_position": 20396,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 20356,
        "derived_state": "G",
        "inherited_state": "T",
        "site_position": 20401,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 21842,
        "derived_state": "C",
        "inherited_state": "T",
        "site_position": 21895,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 21968,
        "derived_state": "A",
        "inherited_state": "G",
        "site_position": 22022,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 22859,
        "derived_state": "G",
        "inherited_state": "T",
        "site_position": 22917,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 22954,
        "derived_state": "C",
        "inherited_state": "G",
        "site_position": 23012,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 23545,
        "derived_state": "G",
        "inherited_state": "C",
        "site_position": 23604,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 24714,
        "derived_state": "T",
        "inherited_state": "A",
        "site_position": 24775,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 25407,
        "derived_state": "T",
        "inherited_state": "C",
        "site_position": 25469,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 25532,
        "derived_state": "T",
        "inherited_state": "G",
        "site_position": 25595,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 26699,
        "derived_state": "G",
        "inherited_state": "T",
        "site_position": 26767,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 27563,
        "derived_state": "C",
        "inherited_state": "T",
        "site_position": 27638,
        "is_reversion": null,
        "is_immediate_reversion": null
      },
      {
        "site_id": 28794,
        "derived_state": "T",
        "inherited_state": "A",
        "site_position": 28881,
        "is_reversion": null,
        "is_immediate_reversion": null
      }
    ],
    "likelihood": 2.171300934439176e-44,
    "cost": 23
  }
}

For reference, here's the match of the early 617 that I put in:

'hmm_match': {'mutations': [{'derived_state': 'T',
      'inherited_state': 'G',
      'site_position': 210},
     {'derived_state': 'A', 'inherited_state': 'G', 'site_position': 425},
     {'derived_state': 'A', 'inherited_state': 'C', 'site_position': 2143},
     {'derived_state': 'T', 'inherited_state': 'C', 'site_position': 3267},
     {'derived_state': 'T', 'inherited_state': 'C', 'site_position': 3457},
     {'derived_state': 'T', 'inherited_state': 'C', 'site_position': 4965},
     {'derived_state': 'G', 'inherited_state': 'A', 'site_position': 11201},
     {'derived_state': 'T', 'inherited_state': 'C', 'site_position': 12053},
     {'derived_state': 'A', 'inherited_state': 'G', 'site_position': 15463},
     {'derived_state': 'T', 'inherited_state': 'G', 'site_position': 17523},
     {'derived_state': 'G', 'inherited_state': 'A', 'site_position': 20396},
     {'derived_state': 'G', 'inherited_state': 'T', 'site_position': 20401},
     {'derived_state': 'C', 'inherited_state': 'T', 'site_position': 21895},
     {'derived_state': 'A', 'inherited_state': 'G', 'site_position': 22022},
     {'derived_state': 'T', 'inherited_state': 'G', 'site_position': 22706},
     {'derived_state': 'G', 'inherited_state': 'T', 'site_position': 22917},
     {'derived_state': 'C', 'inherited_state': 'G', 'site_position': 23012},
     {'derived_state': 'G', 'inherited_state': 'C', 'site_position': 23604},
     {'derived_state': 'T', 'inherited_state': 'A', 'site_position': 24775},
     {'derived_state': 'T', 'inherited_state': 'G', 'site_position': 25019},
     {'derived_state': 'T', 'inherited_state': 'C', 'site_position': 25469},
     {'derived_state': 'C', 'inherited_state': 'T', 'site_position': 27299},
     {'derived_state': 'C', 'inherited_state': 'T', 'site_position': 27638},
     {'derived_state': 'T', 'inherited_state': 'G', 'site_position': 27750},
     {'derived_state': 'T', 'inherited_state': 'G', 'site_position': 28881},
     {'derived_state': 'T', 'inherited_state': 'G', 'site_position': 29402},
     {'derived_state': 'T', 'inherited_state': 'G', 'site_position': 29742}],
    'path': [{'left': 0, 'parent': 58, 'right': 29904}]},

@jeromekelleher
Copy link
Owner

Not sure what to make of this. I think we need to get some curated sequences in here in the right order for 617, 617.1, 617.2 and 617.3. It's going to be impossible to untangle otherwise.

@hyanwong
Copy link
Collaborator Author

hyanwong commented Dec 10, 2024

I don't think it can be SRR14388093 @szhan , that's coming up as a B.1.2 from Texas for me

Hmm, that's weird. What countries are the locations for the other MH-NCCS-87448 100% matches, @szhan ? It's classified as B.1.617.3 in that China CDC paper, but with a collection date of 2020-10-02, not 2021-02-16 as you found. Maybe they have a typo in the paper?

@szhan
Copy link
Collaborator

szhan commented Dec 12, 2024

I got the date of 2021-02-16 in the description line above "hCoV-19/India/MH-NCCS-87448/2021|EPI ISL 1415203.2|2021-02-16". Just checked GISAID. It does say the collection date of MH-NCCS-87448 is 2021-02-16. Hmm.

@szhan
Copy link
Collaborator

szhan commented Dec 12, 2024

@hyanwong Are you referring to hCoV-19/India/ILSGS00308/2020|EPIISL1372093|2020-12-01, which was reportedly sampled on 2020-12-01? I'm not seeing another sample in the phylogeny in the CCDC Weekly report which was sampled on October 02, 2020.

@szhan
Copy link
Collaborator

szhan commented Dec 12, 2024

Tagging this ECDC report that summarises some useful epidemiological background about B.1.617.1, B.1.617.2, and B.1.617.3.

https://www.ecdc.europa.eu/sites/default/files/documents/Emergence-of-SARS-CoV-2-B.1.617-variants-in-India-and-situation-in-the-EUEEA_0.pdf

@szhan
Copy link
Collaborator

szhan commented Dec 12, 2024

Let's use the sequences designated to B.1.617.1 and Delta B.1.617.2 that are listed in lineages.csv in the pango-designation GitHub repo (https://github.com/cov-lineages/pango-designation?tab=readme-ov-file).

These designated sequences were used to propose a new lineage in the first place, so they are the lineage-defining sequences. Assignment tools just use information in these designated sequences to assign variants to new sequences.

For B.1.617.1 and Delta B.1.617.2, there are sequences from COG-UK, so they should be in the SRA/ENA and therefore also in the Viridian dataset. We can just use them as seeds. Sadly, there are no COG-UK sequences for B.1.617.3, but we can live without it as that lineage is not as important.

@jeromekelleher
Copy link
Owner

Sounds great. Can you list them out here when you figure out the mapping to Viridian IDs please?

@szhan
Copy link
Collaborator

szhan commented Dec 13, 2024

There are 2,833 sequences listed in lineages.csv (7ef24a9dff97abe8ef25dc9415494b87; downloaded 12 Dec. 2024 from https://github.com/cov-lineages/pango-designation) that are designated to B.1.617.2 and were sampled in England. These look like they were sequenced by the COG-UK (I've only manually searched for a few samples), and so should be in the SRA/ENA/Viridian dataset.

Note that these designated sequences are of good quality, and were sampled when Delta was just getting recognised. Also, they are not identical.

It's not clear how to see one sequence as a seed. Building a consensus out of them wouldn't be good, I think, because there would likely lead to reversions as Delta samples get attached to it. We could also randomly choose a sequence, but that means there would be no good justification for choosing the sequence.

What else could we do?

@jeromekelleher
Copy link
Owner

Can you link to the pango designation issues for delta lineages please? Do they mention any specific sequences there?

As the sequences are from UK, we can probably trust the dates and therefore pick a few early ones.

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

It's these two issues about designated sequences for B.1.617.2 I've come across.
cov-lineages/pango-designation#113
cov-lineages/pango-designation#135

In issue 113, a post includes a tree showing some samples collected in England. Maybe we could pick some of those as seeds?

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

This is the issue about B.1.617.1.
cov-lineages/pango-designation#55

There are 93 sequences designated to B.1.617.1 in lineages.csv (7ef24a9dff97abe8ef25dc9415494b87; downloaded 12 Dec. 2024 from https://github.com/cov-lineages/pango-designation) that were sampled in England.

@hyanwong
Copy link
Collaborator Author

What else could we do?

Can we see what happens if we consider all these sequences as a single "retro group", do the NJ thing, then add that in?

I'm also keen to use not just UK sequences but also other (esp Indian/Maharashtra) ones, but only if they match a sequence that was described at the time in GISAID. Perhaps we can simply make a list of potential candidates which we can match to a Viridian sample?

@hyanwong
Copy link
Collaborator Author

Sadly, there are no COG-UK sequences for B.1.617.3, but we can live without it as that lineage is not as important.

It might be important for resolving the ancestral states of B.1.617 in general though. If there are one or two B.1.617.3 samples in Viridian which have reasonable dates / QC and we can add to a whitelist, I think we should.

@jeromekelleher jeromekelleher changed the title Origins of delta, for comparison Origins of delta Dec 14, 2024
@jeromekelleher
Copy link
Owner

jeromekelleher commented Dec 14, 2024

Can we please just get some set of candidate sequences together so we can start trying stuff out? Looking at the 617.1 issue there's a handful of samples that should be mappable back. Can we start getting these and others lined up in a verified way? It's much more important that we have something trustworthy in the February-March date range to get things started than it is to get the exact sequences that started things off.

We can specify hundreds or even thousands of sequences on the include list (whitelist is a deprecated term @hyanwong ) if we want. We should stop thinking about these as specific seed sequences, and just as a set of things that we are confident of the dates. The algorithm can (and should) deal with the rest.

The clock is ticking here on getting a full run done before during the holiday, and we must get Delta origins reasonably sorted.

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

Okay, I just retrieved the ENA sample accessions of the 93 sequences designated to B.1.617.1 manually from the ENA website.

strain_name sample_name pango_label ena_sample_accession
England/CAMC-139650A/2021 CAMC-139650A B.1.617.1 SAMEA8250358
England/CAMC-13D3228/2021 CAMC-13D3228 B.1.617.1 SAMEA8415036
England/CAMC-14E7527/2021 CAMC-14E7527 B.1.617.1 SAMEA8590706
England/CAMC-139668F/2021 CAMC-139668F B.1.617.1 SAMEA8264445
England/CAMC-14E7C22/2021 CAMC-14E7C22 B.1.617.1 SAMEA8589800
England/CAMC-141BD6E/2021 CAMC-141BD6E B.1.617.1 SAMEA8503091
England/CAMC-13D37CF/2021 CAMC-13D37CF B.1.617.1 SAMEA8415191
England/ALDP-148DD1C/2021 ALDP-148DD1C B.1.617.1 SAMEA8529918
England/CAMC-14A52C4/2021 CAMC-14A52C4 B.1.617.1 SAMEA8540743
England/RAND-14C7156/2021 RAND-14C7156 B.1.617.1 SAMEA8554577
England/CAMC-14E32AC/2021 CAMC-14E32AC B.1.617.1 SAMEA8588638
England/CAMC-14E79FE/2021 CAMC-14E79FE B.1.617.1 SAMEA8589769
England/QEUH-1407BE3/2021 QEUH-1407BE3 B.1.617.1 SAMEA8425592
England/CAMC-13EE785/2021 CAMC-13EE785 B.1.617.1 SAMEA8455344
England/QEUH-141222C/2021 QEUH-141222C B.1.617.1 SAMEA8459171
England/CAMC-14E0166/2021 CAMC-14E0166 B.1.617.1 SAMEA8588988
England/RAND-1471786/2021 RAND-1471786 B.1.617.1 SAMEA8523731
England/ALDP-1499A52/2021 ALDP-1499A52 B.1.617.1 SAMEA8532599
England/CAMC-13D39BA/2021 CAMC-13D39BA B.1.617.1 SAMEA8415256
England/ALDP-14E1790/2021 ALDP-14E1790 B.1.617.1 SAMEA8596014
England/CAMC-14D2334/2021 CAMC-14D2334 B.1.617.1 SAMEA8587814
England/CAMC-14E7CF5/2021 CAMC-14E7CF5 B.1.617.1 SAMEA8589829
England/CAMC-143E7FE/2021 CAMC-143E7FE B.1.617.1 SAMEA8533101
England/CAMC-14A64CD/2021 CAMC-14A64CD B.1.617.1 SAMEA8540620
England/CAMC-13D4807/2021 CAMC-13D4807 B.1.617.1 SAMEA8415388
England/CAMC-1395FC1/2021 CAMC-1395FC1 B.1.617.1 SAMEA8250250
England/CAMC-1322D74/2021 CAMC-1322D74 B.1.617.1 SAMEA8240357
England/CAMC-13DC8F0/2021 CAMC-13DC8F0 B.1.617.1 SAMEA8416312
England/CAMC-13AF60B/2021 CAMC-13AF60B B.1.617.1 SAMEA8400234
England/ALDP-142FD94/2021 ALDP-142FD94 B.1.617.1 SAMEA8501313
England/CAMC-145B245/2021 CAMC-145B245 B.1.617.1 SAMEA8526474
England/CAMC-1484B0C/2021 CAMC-1484B0C B.1.617.1 SAMEA8526542
England/RAND-14F1A67/2021 RAND-14F1A67 B.1.617.1 SAMEA8590790
England/CAMC-14A4FEC/2021 CAMC-14A4FEC B.1.617.1 SAMEA8540737
England/CAMC-14C9998/2021 CAMC-14C9998 B.1.617.1 SAMEA8555130
England/CAMC-14D20F1/2021 CAMC-14D20F1 B.1.617.1 SAMEA8587787
England/CAMC-14C99D4/2021 CAMC-14C99D4 B.1.617.1 SAMEA8554910
England/CAMC-14CAB46/2021 CAMC-14CAB46 B.1.617.1 SAMEA8555234
England/CAMC-14A6515/2021 CAMC-14A6515 B.1.617.1 SAMEA8540724
England/ALDP-149B2B2/2021 ALDP-149B2B2 B.1.617.1 SAMEA8532374
England/ALDP-149B30A/2021 ALDP-149B30A B.1.617.1 SAMEA8532349
England/QEUH-14CF868/2021 QEUH-14CF868 B.1.617.1 SAMEA8556303
England/QEUH-14CF2A6/2021 QEUH-14CF2A6 B.1.617.1 SAMEA8556335
England/RAND-1439E76/2021 RAND-1439E76 B.1.617.1 SAMEA8504066
England/CAMC-13B81A4/2021 CAMC-13B81A4 B.1.617.1 SAMEA8415007
England/RAND-1471B11/2021 RAND-1471B11 B.1.617.1 SAMEA8523827
England/CAMC-14C9CF9/2021 CAMC-14C9CF9 B.1.617.1 SAMEA8554999
England/CAMC-14E335E/2021 CAMC-14E335E B.1.617.1 SAMEA8588636
England/RAND-14E1DAD/2021 RAND-14E1DAD B.1.617.1 SAMEA8596118
England/RAND-14EB338/2021 RAND-14EB338 B.1.617.1 SAMEA8604805
England/CAMC-14DE9DC/2021 CAMC-14DE9DC B.1.617.1 SAMEA8595762
England/CAMC-14E726C/2021 CAMC-14E726C B.1.617.1 SAMEA8590647
England/CAMC-13AFFF1/2021 CAMC-13AFFF1 B.1.617.1 SAMEA8409386
England/CAMC-13EF881/2021 CAMC-13EF881 B.1.617.1 SAMEA8455558
England/CAMC-1484AF0/2021 CAMC-1484AF0 B.1.617.1 SAMEA8526581
England/CAMC-14DEE37/2021 CAMC-14DEE37 B.1.617.1 SAMEA8595904
England/CAMC-144B5C7/2021 CAMC-144B5C7 B.1.617.1 SAMEA8533135
England/CAMC-143E5E5/2021 CAMC-143E5E5 B.1.617.1 SAMEA8508718
England/RAND-149146B/2021 RAND-149146B B.1.617.1 SAMEA8530532
England/CAMC-144C2FC/2021 CAMC-144C2FC B.1.617.1 SAMEA8526307
England/CAMC-14D263B/2021 CAMC-14D263B B.1.617.1 SAMEA8587818
England/CAMC-14D5D51/2021 CAMC-14D5D51 B.1.617.1 SAMEA8576354
England/CAMC-13B7C2C/2021 CAMC-13B7C2C B.1.617.1 SAMEA8414093
England/MILK-1403B8D/2021 MILK-1403B8D B.1.617.1 SAMEA8456319
England/RAND-1491559/2021 RAND-1491559 B.1.617.1 SAMEA8530536
England/CAMC-142E041/2021 CAMC-142E041 B.1.617.1 SAMEA8508649
England/RAND-1472FC9/2021 RAND-1472FC9 B.1.617.1 SAMEA8524137
England/CAMC-14A668B/2021 CAMC-14A668B B.1.617.1 SAMEA8540600
England/CAMC-14A6506/2021 CAMC-14A6506 B.1.617.1 SAMEA8540700
England/CAMC-14D2316/2021 CAMC-14D2316 B.1.617.1 SAMEA8587765
England/CAMC-14D253E/2021 CAMC-14D253E B.1.617.1 SAMEA8587721
England/CAMC-145BB4A/2021 CAMC-145BB4A B.1.617.1 SAMEA8526368
England/CAMC-14A4DE2/2021 CAMC-14A4DE2 B.1.617.1 SAMEA8540569
England/CAMC-13E7889/2021 CAMC-13E7889 B.1.617.1 SAMEA8422785
England/CAMC-14D2282/2021 CAMC-14D2282 B.1.617.1 SAMEA8587812
England/CAMC-1322E9F/2021 CAMC-1322E9F B.1.617.1 SAMEA8240369
England/CAMC-14E32CA/2021 CAMC-14E32CA B.1.617.1 SAMEA8588728
England/MILK-14D984D/2021 MILK-14D984D B.1.617.1 SAMEA8576261
England/CAMC-14D1B4C/2021 CAMC-14D1B4C B.1.617.1 SAMEA8587843
England/CAMC-14ACC74/2021 CAMC-14ACC74 B.1.617.1 SAMEA8540859
England/CAMC-14ABA8A/2021 CAMC-14ABA8A B.1.617.1 SAMEA8540848
England/CAMC-14E792B/2021 CAMC-14E792B B.1.617.1 SAMEA8589740
England/CAMC-13FBE17/2021 CAMC-13FBE17 B.1.617.1 SAMEA8460142
England/MILK-14C78AC/2021 MILK-14C78AC B.1.617.1 SAMEA8554485
England/MILK-14C71A1/2021 MILK-14C71A1 B.1.617.1 SAMEA8554645
England/CAMC-14ABA6C/2021 CAMC-14ABA6C B.1.617.1 SAMEA8540826
England/CAMC-13A477C/2021 CAMC-13A477C B.1.617.1 SAMEA8412389
England/MILK-13D9C33/2021 MILK-13D9C33 B.1.617.1 SAMEA8413408
England/CAMC-14D1FF5/2021 CAMC-14D1FF5 B.1.617.1 SAMEA8587938
England/MILK-14496D5/2021 MILK-14496D5 B.1.617.1 SAMEA8505152
England/CAMC-145B694/2021 CAMC-145B694 B.1.617.1 SAMEA8526405
England/CAMC-13EF7B1/2021 CAMC-13EF7B1 B.1.617.1 SAMEA8455514
England/CAMC-14B59F9/2021 CAMC-14B59F9 B.1.617.1 SAMEA8546853

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

Here are 200+ designated sequences for B.1.617.2 that we can work on for now. I'll get the rest later.

strain_name sample_name pango_label ena_sample_accession
England/CAMC-14D5D42/2021 CAMC-14D5D42 B.1.617.2 SAMEA8564746
England/CAMC-141DB08/2021 CAMC-141DB08 B.1.617.2 SAMEA8504853
England/CAMC-142DAEA/2021 CAMC-142DAEA B.1.617.2 SAMEA8508539
England/CAMC-14B5CB4/2021 CAMC-14B5CB4 B.1.617.2 SAMEA8546753
England/RAND-14DD366/2021 RAND-14DD366 B.1.617.2 SAMEA8596797
England/CAMC-14D261D/2021 CAMC-14D261D B.1.617.2 SAMEA8587770
England/CAMC-14C99C5/2021 CAMC-14C99C5 B.1.617.2 SAMEA8554880
England/CAMC-14E22BC/2021 CAMC-14E22BC B.1.617.2 SAMEA8596243
England/ALDP-14D46FD/2021 ALDP-14D46FD B.1.617.2 SAMEA8576333
England/CAMC-14E338B/2021 CAMC-14E338B B.1.617.2 SAMEA8588729
England/CAMC-14E7A0A/2021 CAMC-14E7A0A B.1.617.2 SAMEA8589796
England/CAMC-14ACDDB/2021 CAMC-14ACDDB B.1.617.2 SAMEA8540820
England/MILK-14DDFBD/2021 MILK-14DDFBD B.1.617.2 SAMEA8596642
England/CAMC-14B5B4E/2021 CAMC-14B5B4E B.1.617.2 SAMEA8549375
England/CAMC-14ABF7C/2021 CAMC-14ABF7C B.1.617.2 SAMEA8541935
England/CAMC-14C9AE0/2021 CAMC-14C9AE0 B.1.617.2 SAMEA8555075
England/MILK-14DDCD4/2021 MILK-14DDCD4 B.1.617.2 SAMEA8596661
England/ALDP-14EDD1A/2021 ALDP-14EDD1A B.1.617.2 SAMEA8590469
England/CAMC-14ACDCC/2021 CAMC-14ACDCC B.1.617.2 SAMEA8540817
England/QEUH-14D153F/2021 QEUH-14D153F B.1.617.2 SAMEA8588064
England/CAMC-14D5C72/2021 CAMC-14D5C72 B.1.617.2 SAMEA8564849
England/ALDP-14C5BE1/2021 ALDP-14C5BE1 B.1.617.2 SAMEA8553876
England/CAMC-145B49A/2021 CAMC-145B49A B.1.617.2 SAMEA8526330
England/MILK-14E0272/2021 MILK-14E0272 B.1.617.2 SAMEA8589176
England/RAND-14D44C6/2021 RAND-14D44C6 B.1.617.2 SAMEA8564319
England/RAND-14CE3A4/2021 RAND-14CE3A4 B.1.617.2 SAMEA8555854
England/MILK-14BF397/2021 MILK-14BF397 B.1.617.2 SAMEA8596427
England/CAMC-14D1FAA/2021 CAMC-14D1FAA B.1.617.2 SAMEA8587853
England/RAND-14F19F1/2021 RAND-14F19F1 B.1.617.2 SAMEA8590815
England/CAMC-14DEBF4/2021 CAMC-14DEBF4 B.1.617.2 SAMEA8595813
England/RAND-14E21BF/2021 RAND-14E21BF B.1.617.2 SAMEA8595980
England/CAMC-14ABF6D/2021 CAMC-14ABF6D B.1.617.2 SAMEA8541934
England/CAMC-14C9C08/2021 CAMC-14C9C08 B.1.617.2 SAMEA8562026
England/CAMC-14DECF1/2021 CAMC-14DECF1 B.1.617.2 SAMEA8595708
England/CAMC-14DECA6/2021 CAMC-14DECA6 B.1.617.2 SAMEA8595831
England/CAMC-14C9C71/2021 CAMC-14C9C71 B.1.617.2 SAMEA8555107
England/CAMC-14E7563/2021 CAMC-14E7563 B.1.617.2 SAMEA8590816
England/CAMC-14D5DAC/2021 CAMC-14D5DAC B.1.617.2 SAMEA8564703
England/CAMC-14D5E21/2021 CAMC-14D5E21 B.1.617.2 SAMEA8564681
England/CAMC-14A5112/2021 CAMC-14A5112 B.1.617.2 SAMEA8541907
England/CAMC-14E2F97/2021 CAMC-14E2F97 B.1.617.2 SAMEA8588884
England/CAMC-14E74A2/2021 CAMC-14E74A2 B.1.617.2 SAMEA8590705
England/CAMC-14D5BFD/2021 CAMC-14D5BFD B.1.617.2 SAMEA8564847
England/CAMC-14C9ED5/2021 CAMC-14C9ED5 B.1.617.2 SAMEA8555169
England/CAMC-14E7B61/2021 CAMC-14E7B61 B.1.617.2 SAMEA8589773
England/CAMC-14C2C5A/2021 CAMC-14C2C5A B.1.617.2 SAMEA8576010
England/QEUH-14DF6F2/2021 QEUH-14DF6F2 B.1.617.2 SAMEA8595886
England/CAMC-14DEBA9/2021 CAMC-14DEBA9 B.1.617.2 SAMEA8595706
England/RAND-14E1D70/2021 RAND-14E1D70 B.1.617.2 SAMEA8596030
England/MILK-152B73F/2021 MILK-152B73F B.1.617.2 not found
England/MILK-1550ED2/2021 MILK-1550ED2 B.1.617.2 SAMEA8721144
England/MILK-1550FB1/2021 MILK-1550FB1 B.1.617.2 SAMEA8721255
England/MILK-15554AE/2021 MILK-15554AE B.1.617.2 SAMEA8731648
England/MILK-15555AB/2021 MILK-15555AB B.1.617.2 SAMEA8731489
England/MILK-1555769/2021 MILK-1555769 B.1.617.2 not found
England/MILK-1555796/2021 MILK-1555796 B.1.617.2 SAMEA8731639
England/MILK-15557D2/2021 MILK-15557D2 B.1.617.2 not found
England/MILK-1555C5B/2021 MILK-1555C5B B.1.617.2 not found
England/MILK-1555C6A/2021 MILK-1555C6A B.1.617.2 SAMEA8731522
England/MILK-15650B6/2021 MILK-15650B6 B.1.617.2 SAMEA8764858
England/QEUH-15535BC/2021 QEUH-15535BC B.1.617.2 SAMEA8731549
England/QEUH-14E86C9/2021 QEUH-14E86C9 B.1.617.2 SAMEA8590063
England/MILK-14F617C/2021 MILK-14F617C B.1.617.2 SAMEA8591325
England/QEUH-14E884B/2021 QEUH-14E884B B.1.617.2 SAMEA8590154
England/CAMC-14DEE28/2021 CAMC-14DEE28 B.1.617.2 SAMEA8595881
England/CAMC-14E66AD/2021 CAMC-14E66AD B.1.617.2 SAMEA8592028
England/ALDP-14FCFD6/2021 ALDP-14FCFD6 B.1.617.2 SAMEA8591548
England/ALDP-14FC659/2021 ALDP-14FC659 B.1.617.2 SAMEA8591609
England/CAMC-14F84CC/2021 CAMC-14F84CC B.1.617.2 SAMEA8591841
England/CAMC-14F86F3/2021 CAMC-14F86F3 B.1.617.2 SAMEA8591996
England/ALDP-14FC413/2021 ALDP-14FC413 B.1.617.2 SAMEA8591551
England/ALDP-14FCF9A/2021 ALDP-14FCF9A B.1.617.2 SAMEA8591538
England/RAND-14FD46D/2021 RAND-14FD46D B.1.617.2 SAMEA8591622
England/CAMC-14E6722/2021 CAMC-14E6722 B.1.617.2 SAMEA8591892
England/ALDP-14F8A06/2021 ALDP-14F8A06 B.1.617.2 SAMEA8591931
England/CAMC-14E69A4/2021 CAMC-14E69A4 B.1.617.2 SAMEA8592039
England/CAMC-14F3D02/2021 CAMC-14F3D02 B.1.617.2 SAMEA8591199
England/CAMC-14E66BC/2021 CAMC-14E66BC B.1.617.2 SAMEA8605607
England/CAMC-1502BBA/2021 CAMC-1502BBA B.1.617.2 SAMEA8614620
England/CAMC-1502956/2021 CAMC-1502956 B.1.617.2 SAMEA8614652
England/CAMC-15034AF/2021 CAMC-15034AF B.1.617.2 SAMEA8614764
England/MILK-15060E4/2021 MILK-15060E4 B.1.617.2 SAMEA8613384
England/MILK-1506406/2021 MILK-1506406 B.1.617.2 SAMEA8613316
England/MILK-15056B6/2021 MILK-15056B6 B.1.617.2 SAMEA8613171
England/CAMC-1502B8D/2021 CAMC-1502B8D B.1.617.2 SAMEA8614534
England/CAMC-1508DDC/2021 CAMC-1508DDC B.1.617.2 SAMEA8612642
England/MILK-1509FF3/2021 MILK-1509FF3 B.1.617.2 SAMEA8613044
England/CAMC-1508B87/2021 CAMC-1508B87 B.1.617.2 SAMEA8612783
England/CAMC-1508EBB/2021 CAMC-1508EBB B.1.617.2 SAMEA8612718
England/ALDP-15089D8/2021 ALDP-15089D8 B.1.617.2 SAMEA8612720
England/CAMC-1508D72/2021 CAMC-1508D72 B.1.617.2 SAMEA8612738
England/CAMC-1507C85/2021 CAMC-1507C85 B.1.617.2 SAMEA8612728
England/CAMC-150904B/2021 CAMC-150904B B.1.617.2 SAMEA8612814
England/ALDP-15082CE/2021 ALDP-15082CE B.1.617.2 SAMEA8612797
England/ALDP-150864A/2021 ALDP-150864A B.1.617.2 SAMEA8612736
England/ALDP-1507FD7/2021 ALDP-1507FD7 B.1.617.2 SAMEA8612790
England/ALDP-14FB55D/2021 ALDP-14FB55D B.1.617.2 SAMEA8613918
England/RAND-14F1DE6/2021 RAND-14F1DE6 B.1.617.2 SAMEA8606078
England/MILK-14F524D/2021 MILK-14F524D B.1.617.2 SAMEA8605828
England/ALDP-14F228C/2021 ALDP-14F228C B.1.617.2 SAMEA8606022
England/ALDP-14F4008/2021 ALDP-14F4008 B.1.617.2 SAMEA8605815
England/ALDP-1513697/2021 ALDP-1513697 B.1.617.2 SAMEA8621192
England/ALDP-150EDE5/2021 ALDP-150EDE5 B.1.617.2 SAMEA8621602
England/ALDP-151845C/2021 ALDP-151845C B.1.617.2 SAMEA8621494
England/MILK-1514766/2021 MILK-1514766 B.1.617.2 SAMEA8621145
England/ALDP-1511F16/2021 ALDP-1511F16 B.1.617.2 SAMEA8621584
England/ALDP-1513961/2021 ALDP-1513961 B.1.617.2 SAMEA8621170
England/ALDP-150F79B/2021 ALDP-150F79B B.1.617.2 SAMEA8621156
England/RAND-1514D91/2021 RAND-1514D91 B.1.617.2 SAMEA8621431
England/CAMC-1510B6E/2021 CAMC-1510B6E B.1.617.2 SAMEA8621005
England/MILK-15128CF/2021 MILK-15128CF B.1.617.2 SAMEA8620974
England/CAMC-1510A43/2021 CAMC-1510A43 B.1.617.2 SAMEA8620807
England/MILK-1512443/2021 MILK-1512443 B.1.617.2 SAMEA8620939
England/ALDP-151851D/2021 ALDP-151851D B.1.617.2 SAMEA8621496
England/MILK-1512319/2021 MILK-1512319 B.1.617.2 SAMEA8620742
England/CAMC-1510A07/2021 CAMC-1510A07 B.1.617.2 SAMEA8621030
England/CAMC-1510AF8/2021 CAMC-1510AF8 B.1.617.2 SAMEA8620809
England/MILK-1513B89/2021 MILK-1513B89 B.1.617.2 SAMEA8620752
England/CAMC-1510A8F/2021 CAMC-1510A8F B.1.617.2 SAMEA8620920
England/MILK-15140E3/2021 MILK-15140E3 B.1.617.2 SAMEA8620880
England/CAMC-15109FB/2021 CAMC-15109FB B.1.617.2 SAMEA8621002
England/CAMC-1510B40/2021 CAMC-1510B40 B.1.617.2 SAMEA8620949
England/MILK-151407A/2021 MILK-151407A B.1.617.2 SAMEA8621017
England/MILK-151413B/2021 MILK-151413B B.1.617.2 SAMEA8621019
England/CAMC-15107C4/2021 CAMC-15107C4 B.1.617.2 SAMEA8620997
England/QEUH-150C51C/2021 QEUH-150C51C B.1.617.2 SAMEA8620398
England/CAMC-150D241/2021 CAMC-150D241 B.1.617.2 SAMEA8620696
England/CAMC-1502B32/2021 CAMC-1502B32 B.1.617.2 SAMEA8618689
England/ALDP-150DBDD/2021 ALDP-150DBDD B.1.617.2 SAMEA8620535
England/QEUH-150C52B/2021 QEUH-150C52B B.1.617.2 SAMEA8620400
England/PHWC-PYD9IS/2021 PHWC-PYD9IS B.1.617.2 SAMEA8649645
England/PHWC-PYD9B9/2021 PHWC-PYD9B9 B.1.617.2 SAMEA8649411
England/RAND-1520785/2021 RAND-1520785 B.1.617.2 SAMEA8656696
England/CAMC-151FDF0/2021 CAMC-151FDF0 B.1.617.2 SAMEA8656683
England/RAND-152054F/2021 RAND-152054F B.1.617.2 SAMEA8656635
England/RAND-1520530/2021 RAND-1520530 B.1.617.2 SAMEA8656633
England/MILK-1517CBF/2021 MILK-1517CBF B.1.617.2 SAMEA8655358
England/RAND-1520688/2021 RAND-1520688 B.1.617.2 SAMEA8656875
England/ALDP-152007B/2021 ALDP-152007B B.1.617.2 SAMEA8656836
England/ALDP-15202B1/2021 ALDP-15202B1 B.1.617.2 SAMEA8656842
England/ALDP-152022A/2021 ALDP-152022A B.1.617.2 SAMEA8656840
England/ALDP-152003F/2021 ALDP-152003F B.1.617.2 SAMEA8656728
England/MILK-1517FA7/2021 MILK-1517FA7 B.1.617.2 SAMEA8655367
England/ALDP-151FEA2/2021 ALDP-151FEA2 B.1.617.2 SAMEA8656751
England/QEUH-150E0A0/2021 QEUH-150E0A0 B.1.617.2 SAMEA8655141
England/QEUH-150DDD7/2021 QEUH-150DDD7 B.1.617.2 SAMEA8655218
England/QEUH-150E2E6/2021 QEUH-150E2E6 B.1.617.2 SAMEA8655177
England/QEUH-150DFE0/2021 QEUH-150DFE0 B.1.617.2 SAMEA8655140
England/ALDP-151462D/2021 ALDP-151462D B.1.617.2 SAMEA8655161
England/MILK-151C5CE/2021 MILK-151C5CE B.1.617.2 SAMEA8656076
England/CAMC-151B714/2021 CAMC-151B714 B.1.617.2 SAMEA8655951
England/MILK-151C4D0/2021 MILK-151C4D0 B.1.617.2 SAMEA8655983
England/MILK-151BA39/2021 MILK-151BA39 B.1.617.2 SAMEA8656084
England/MILK-151B486/2021 MILK-151B486 B.1.617.2 SAMEA8655984
England/CAMC-151B750/2021 CAMC-151B750 B.1.617.2 SAMEA8655930
England/MILK-151CC14/2021 MILK-151CC14 B.1.617.2 SAMEA8656208
England/CAMC-151AAA3/2021 CAMC-151AAA3 B.1.617.2 SAMEA8656614
England/CAMC-151AA2B/2021 CAMC-151AA2B B.1.617.2 SAMEA8656618
England/CAMC-151AAD0/2021 CAMC-151AAD0 B.1.617.2 SAMEA8656608
England/MILK-151C92C/2021 MILK-151C92C B.1.617.2 SAMEA8656204
England/MILK-151C449/2021 MILK-151C449 B.1.617.2 SAMEA8656050
England/MILK-1516ED8/2021 MILK-1516ED8 B.1.617.2 SAMEA8655813
England/ALDP-1519C9F/2021 ALDP-1519C9F B.1.617.2 SAMEA8655760
England/RAND-1515DDC/2021 RAND-1515DDC B.1.617.2 SAMEA8655593
England/MILK-1523CFC/2021 MILK-1523CFC B.1.617.2 SAMEA8687025
England/MILK-1523A4D/2021 MILK-1523A4D B.1.617.2 SAMEA8686819
England/RAND-151ED2E/2021 RAND-151ED2E B.1.617.2 SAMEA8686489
England/ALDP-1522DCD/2021 ALDP-1522DCD B.1.617.2 SAMEA8686907
England/MILK-152339D/2021 MILK-152339D B.1.617.2 SAMEA8686734
England/MILK-1523193/2021 MILK-1523193 B.1.617.2 SAMEA8687024
England/MILK-1523843/2021 MILK-1523843 B.1.617.2 SAMEA8686873
England/ALDP-1522CFD/2021 ALDP-1522CFD B.1.617.2 SAMEA8686776
England/MILK-1523A5C/2021 MILK-1523A5C B.1.617.2 SAMEA8686847
England/RAND-151F6B9/2021 RAND-151F6B9 B.1.617.2 SAMEA8686657
England/ALDP-1522EAC/2021 ALDP-1522EAC B.1.617.2 SAMEA8686727
England/MILK-1523CCF/2021 MILK-1523CCF B.1.617.2 SAMEA8686942
England/RAND-151EC4F/2021 RAND-151EC4F B.1.617.2 SAMEA8686438
England/MILK-152337F/2021 MILK-152337F B.1.617.2 SAMEA8687030
England/MILK-15248BB/2021 MILK-15248BB B.1.617.2 SAMEA8687342
England/MILK-1524781/2021 MILK-1524781 B.1.617.2 SAMEA8687152
England/MILK-15243D8/2021 MILK-15243D8 B.1.617.2 SAMEA8687118
England/MILK-1524BEF/2021 MILK-1524BEF B.1.617.2 SAMEA8687108
England/MILK-152444E/2021 MILK-152444E B.1.617.2 SAMEA8687307
England/MILK-15244B7/2021 MILK-15244B7 B.1.617.2 SAMEA8687174
England/MILK-15267AD/2021 MILK-15267AD B.1.617.2 SAMEA8687074
England/ALDP-15253C8/2021 ALDP-15253C8 B.1.617.2 SAMEA8687066
England/MILK-15268F5/2021 MILK-15268F5 B.1.617.2 SAMEA8687100
England/MILK-15248CA/2021 MILK-15248CA B.1.617.2 SAMEA8687338
England/MILK-1524806/2021 MILK-1524806 B.1.617.2 SAMEA8687051
England/ALDP-1526A4A/2021 ALDP-1526A4A B.1.617.2 SAMEA8687125
England/MILK-1524763/2021 MILK-1524763 B.1.617.2 SAMEA8687076
England/MILK-152461B/2021 MILK-152461B B.1.617.2 SAMEA8687122
England/MILK-1525595/2021 MILK-1525595 B.1.617.2 SAMEA8687517
England/ALDP-15273A8/2021 ALDP-15273A8 B.1.617.2 SAMEA8687505
England/MILK-1528626/2021 MILK-1528626 B.1.617.2 SAMEA8687939
England/MILK-1528538/2021 MILK-1528538 B.1.617.2 SAMEA8687833
England/MILK-152877E/2021 MILK-152877E B.1.617.2 SAMEA8687943
England/MILK-152C82C/2021 MILK-152C82C B.1.617.2 SAMEA8688060
England/ALDP-152BB42/2021 ALDP-152BB42 B.1.617.2 SAMEA8688100
England/MILK-152C4FB/2021 MILK-152C4FB B.1.617.2 SAMEA8688052
England/ALDP-152BD5B/2021 ALDP-152BD5B B.1.617.2 SAMEA8688048
England/ALDP-152BE2B/2021 ALDP-152BE2B B.1.617.2 SAMEA8688076
England/ALDP-153C689/2021 ALDP-153C689 B.1.617.2 SAMEA8690220
England/MILK-153BC5B/2021 MILK-153BC5B B.1.617.2 SAMEA8689991
England/MILK-153B031/2021 MILK-153B031 B.1.617.2 SAMEA8690138
England/MILK-153D04E/2021 MILK-153D04E B.1.617.2 SAMEA8690051
England/MILK-153AE56/2021 MILK-153AE56 B.1.617.2 SAMEA8689994
England/MILK-153AF44/2021 MILK-153AF44 B.1.617.2 SAMEA8690065
England/MILK-153BC88/2021 MILK-153BC88 B.1.617.2 SAMEA8690059
England/MILK-153C21C/2021 MILK-153C21C B.1.617.2 SAMEA8690234
England/MILK-153CEDC/2021 MILK-153CEDC B.1.617.2 SAMEA8690027
England/MILK-153CFD9/2021 MILK-153CFD9 B.1.617.2 SAMEA8690144
England/ALDP-153C5C8/2021 ALDP-153C5C8 B.1.617.2 SAMEA8690219
England/MILK-153D0C6/2021 MILK-153D0C6 B.1.617.2 SAMEA8690242
England/QEUH-1531350/2021 QEUH-1531350 B.1.617.2 SAMEA8689668
England/MILK-153D0D5/2021 MILK-153D0D5 B.1.617.2 SAMEA8689981
England/MILK-153B05F/2021 MILK-153B05F B.1.617.2 SAMEA8690184
England/MILK-153BE91/2021 MILK-153BE91 B.1.617.2 SAMEA8689989
England/ALDP-153C504/2021 ALDP-153C504 B.1.617.2 SAMEA8690217
England/MILK-153CEFA/2021 MILK-153CEFA B.1.617.2 SAMEA8690071
England/MILK-153AF26/2021 MILK-153AF26 B.1.617.2 SAMEA8690023

@hyanwong
Copy link
Collaborator Author

Great, are these all in late 2020 then?

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

No, all early 2021. I'm not seeing any designated sequences in England sampled in 2020.

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

Also, I'm not finding designated sequences for B.1.617.2 from India sampled in 2020. I've found a few for B.1.617.1 from India sampled in 2020, but I have no luck linking them to SRA/ENA/Viridian.

@hyanwong
Copy link
Collaborator Author

Also, I'm not finding designated sequences for B.1.617.2 from India sampled in 2020. I've found a few for B.1.617.1 from India sampled in 2020, but I have no luck linking them to SRA/ENA/Viridian.

What about these:

Screenshot 2024-12-06 at 23 18 43

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

Hmm, not having luck for those above either.

@jeromekelleher
Copy link
Owner

Great, thanks. Anything for 617?

@jeromekelleher
Copy link
Owner

Can you copy CSV of the data to slack also please? Would save me the trouble of parsing the markdown so I can join with viridian. Can you also include sampling date provided, so we can compare with viridian date?

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

Can you copy CSV of the data to slack also please? Would save me the trouble of parsing the markdown so I can join with viridian. Can you also include sampling date provided, so we can compare with viridian date?

Yes, I'll provide a CSV file. I've just managed to link 900+ entries.

So far, all the samples I've looked at are from COG-UK. Their dates were preferentially chosen over the dates from other sources, so they should be identical to those in the Viridian metadata file.

@szhan
Copy link
Collaborator

szhan commented Dec 14, 2024

Great, thanks. Anything for 617?

Oddly, I'm only seeing one designated sequence for B.1.617.

It is India/MH-NCCS-P1162000180788/2021, and I'm not finding an associated ENA/SRA accession for it.

@szhan
Copy link
Collaborator

szhan commented Dec 15, 2024

Okay, here is a much better approach, I think.

We can download a file containing the run accession and sample alias (which is contained in the strain name) for all the genomes sequenced by the COG-UK, because they are organised under the project PRJEB37886 in the ENA.

This file can be downloaded by going to https://www.ebi.ac.uk/ena/browser/view/PRJEB37886, and then by selecting the columns run accession and sample alias. There should be 2,700,702 entries.

We can do some parsing to figure out which designated sequences have an ENA run accession, and based on that we can narrow down a list of candidate sequences to attach by the sampling dates.

EDIT: We are using run accessions rather than sample accessions.

@szhan
Copy link
Collaborator

szhan commented Dec 15, 2024

Using the above approach, I'm getting 64 sequences for B.1.617.1 and 10 sequences for B.1.617.2 in the Viridian dataset with a sampling date before April 01, 2021 in the Viridian metadata file (excluding NY Eve 2020).

The B.1.617.1 sequences are reportedly sampled 2021-02-22 to 2021-03-31 (inclusive).
The B.1.617.2 sequences are reportedly sampled 2021-03-18 to 2021-03-31 (inclusive).

Note to self: There are 71 sequences for B.1.617.2 with sampling date of NY Eve 2020 in the Viridian metadata file. I manually checked a few against the ENA, and found that the reported sampling date is 2020.

@szhan
Copy link
Collaborator

szhan commented Dec 15, 2024

Just noting again that I'm not finding designated sequences for B.1.617 and B.1.617.3 from COG-UK with sampling dates before April 01, 2021.

@jeromekelleher
Copy link
Owner

Great! Can you post the code for doing this in a pr so this is reproducible please?

@jeromekelleher
Copy link
Owner

@szhan
Copy link
Collaborator

szhan commented Dec 15, 2024

Turns out that we can just download all the ENA run accessions and sample aliases in one go easily by visiting the project page. No need for API access after all.

@szhan
Copy link
Collaborator

szhan commented Dec 16, 2024

This is what I'm doing to subset the sequences.

import pandas as pd

# Pango designated sequences
# lineage, sample name
pango = pd.read_csv("lineages.csv.gz", sep=",")
pango["sample_name"] = [s.split("/")[1] for s in pango["taxon"]]

# ENA
# run accession, sample name
ena = pd.read_csv("filereport_read_run_PRJEB37886_tsv.txt.gz", sep="\t")
ena["sample_name"] = [s.split("/")[1] for s in ena["sample_alias"]]

# Viridian dataset
# strain (run accession)
viridian = pd.read_csv("../sc2ts/data/run_metadata.v04.renamed.dedup.trimmed.tsv", sep="\t")
viridian["parsed_datetime"] = pd.to_datetime(viridian["date"])

focal_pango = "B.1.617.1"
#focal_pango = "B.1.617.2"
out_file = "candidate_seeds_" + focal_pango + "_pre-2021-04-01.txt"

designated_samples = pango[pango["lineage"] == focal_pango]["sample_name"]
coguk_runs = ena[ena["sample_name"].isin(designated_samples)]["run_accession"]
viridian_samples = viridian[viridian["strain"].isin(coguk_runs)]
viridian_samples.head()

viridian_samples[
    (viridian_samples["parsed_datetime"] < pd.to_datetime("2021-04-01")) & \
    (viridian_samples["parsed_datetime"] != pd.to_datetime("2020-12-31"))
][["strain", "date"]].to_csv(out_file, index=False)

@jeromekelleher
Copy link
Owner

Here's some useful info on 617 starting out in cov-lineages/pango-designation#38 and cov-lineages/pango-designation#49

@jeromekelleher
Copy link
Owner

xrefing #258 as there's lots of discussion there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants