Visualisation of simplified backbone phylogenies #222

szhan · 2024-09-25T09:55:45Z

Probably a simple approach to visually compare the backbones of the Viridian UShER tree and our pandemic-scale ARG is to leverage the Pango lineage roots, excluding the Pango recombinants. The Pango lineage roots are already labelled in the UShER tree, but it is trickier to get the corresponding nodes in our ARGs. Suppose we do have the nodes identified, then we could simplify down to only those nodes (n = 2,131). For a cleaner view, we could exclude the less evolutionarily/epidemiologically relevant Pango lineages.

szhan · 2024-09-25T10:03:53Z

This can similarly be done for the global, all-time Nextstrain tree. Instead of using the Pango lineage labels, we would use the Nextstrain clade definitions. I think the easiest way may be to use the nucleotide definitions for each Nextstrain clade in order to identify the corresponding node in our ARGs, at least for the first pass. We should see the same clade hierarchy.

Some useful files for this analysis are here:
https://github.com/nextstrain/ncov/blob/master/defaults/clade_hierarchy.tsv
https://github.com/nextstrain/ncov/blob/master/defaults/clades_who.tsv
https://github.com/nextstrain/ncov/blob/master/defaults/clades.tsv

szhan · 2024-09-26T10:51:06Z

Also just encountered this list of mutations in the founder sequences of the Nextstrain clades assembled by Richard Neher. Tagging the list here in case they come in handy.

https://raw.githubusercontent.com/neherlab/SC2_variant_rates/cd6e016a511098123b6ce9ed874f58a7b789b34c/data/clade_gts.json

hyanwong · 2024-10-11T13:11:47Z

I did that in a simplistic way, by taking the MRCA of all the samples labelled as PangoNNN as the origin of that lineage, and collapsing those nodes. This is highly sensitive to errors in lineage designation, but easy to do. It does lead to a tree in which many Pango lineages share the same origin node, and it shows many large polytomies retained.

It's interesting that simplifying to these lineages means that we only have 5 trees, so presumably 4 recombination nodes. this is few enough that we could probably look at them by hand to check how believable they are (presumably, not very), and what might be triggering a recombination at those points.

import collections
tree = ts.at(21563)  # start of spike
pango_mrcas = {}
node_labels = collections.defaultdict(list)
for p, samples in ti.pango_lineage_samples.items():
    if not p.startswith("X") and not p == 'unknown':
        if len(samples) == 1:
            pango_mrcas[p] = samples[0]
        else:
            pango_mrcas[p] = tree.mrca(*samples)
        node_labels[pango_mrcas[p]].append(p)
node_labels = {k: "/".join([p for p in v]) for k, v in node_labels.items()}
sts = ts.simplify(list(set(pango_mrcas.values())), filter_nodes=False, keep_unary=True)
print("ARG simplified to pango non-X lineages has", sts.num_trees, "trees")
sts.at(21563).draw_svg(
    size=(2500, 1000),
    node_labels=node_labels,
    style=".leaf > .lab {text-anchor: start; transform: rotate(90deg) translate(6px)} .node text {font-size: 9px}",
    omit_sites=True,
)

hyanwong · 2024-10-15T09:02:34Z

Jerome and I thought of another way, or intermediate hackiness: we could use Ana's lineage imputation method, and simply find the earliest node for each imputed pango lineage. We could test by looking both at the proportion of times that the standard lineage-defining mutations occur above this node (NB: if it is a unary node, we should include nodes below it too).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visualisation of simplified backbone phylogenies #222

Visualisation of simplified backbone phylogenies #222

szhan commented Sep 25, 2024

szhan commented Sep 25, 2024

szhan commented Sep 26, 2024

hyanwong commented Oct 11, 2024 •

edited

Loading

hyanwong commented Oct 15, 2024

Visualisation of simplified backbone phylogenies #222

Visualisation of simplified backbone phylogenies #222

Comments

szhan commented Sep 25, 2024

szhan commented Sep 25, 2024

szhan commented Sep 26, 2024

hyanwong commented Oct 11, 2024 • edited Loading

hyanwong commented Oct 15, 2024

hyanwong commented Oct 11, 2024 •

edited

Loading