Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a graph plot into "What is a tree sequence" #264

Open
hyanwong opened this issue Nov 16, 2023 · 2 comments
Open

Add a graph plot into "What is a tree sequence" #264

hyanwong opened this issue Nov 16, 2023 · 2 comments

Comments

@hyanwong
Copy link
Member

To encourage people to think of tree sequences as graph objects, I think it would be helpful to add the graph representation to the "What is a Tree Sequence" tutorial, round about here. This is how you might do it:

import tskit_arg_visualizer as viz

arg = viz.D3ARG.from_ts(ts=ts)
arg.set_node_labels({k: (v if k in ts.samples() else "") for k, v in labels.items()})
arg.draw(
    variable_edge_width=True,
    y_axis_scale="time",
    sample_order=sorted({k: v for k, v in labels.items()if k in ts.samples()}, key=lambda x: labels[x]))

Currently this gives a plot like this:

Screenshot 2023-11-16 at 11 53 50

I think a few things would be helpful to make this look simpler. In particular, if we could change the node sizes & shapes such that the internal nodes are (very) small circles and the sample nodes are square, that would match the tree-by-tree plot above it (kitchensjn/tskit_arg_visualizer#30). Allowing the y-axis ticks to be set to user-chosen values would also be helpful, I think.

Perhaps @kitchensjn has some ideas about how to make the plot friendly to a newcomer in this context?

Note that ts has been produced by code in the nodebook, like that below:

import msprime
import demes

def whatis_example():
    demes_yml = """\
        description:
          Asymmetric migration between two extant demes.
        time_units: generations
        defaults:
          epoch:
            start_size: 5000
        demes:
          - name: Ancestral_population
            epochs:
              - end_time: 1000
          - name: A
            ancestors: [Ancestral_population]
          - name: B
            ancestors: [Ancestral_population]
            epochs:
              - start_size: 2000
                end_time: 500
              - start_size: 400
                end_size: 10000
        migrations:
          - source: A
            dest: B
            rate: 1e-4
        """
    graph = demes.loads(demes_yml)
    demography = msprime.Demography.from_demes(graph)
    # Choose seed so num_trees=3, tips are in same order,
    # first 2 trees are topologically different, and all trees have the same root
    seed = 12581
    ts = msprime.sim_ancestry(
        samples={"A": 2, "B": 3},
        demography=demography,
        recombination_rate=1e-8,
        sequence_length=1000,
        random_seed=seed)
    # Mutate
    # Choose seed to give 12 muts, last one above node 14
    seed = 1476
    return msprime.sim_mutations(ts, rate=1e-7, random_seed=seed)
@kitchensjn
Copy link

Added the changes you mentioned to the tskt_arg_visualizer 0.0.2 milestone and should be pretty straightforward to implement!

I personally like the node labels when mapping between the trees and the ARG. Without the nodes, it might be a bit difficult for newcomers to grasp how (and why) the trees are woven together. Something like this paragraph

A major benefit of “tree sequence thinking” is the close relationship between the tree sequence and the underlying biological processes that produced the genetic sequences in the first place, such as those pictured in the demography above. For example, each branch point (or “internal node”) in one of our trees can be imagined as a genome which existed at a specific time in the past, and which is a “most recent common ancestor” (MRCA) of the descendant genomes at that position on the chromosome. We can mark these extra “ancestral genomes” on our tree diagrams, distinguishing them from the sampled genomes (a to j) by using circular symbols.

from lower on the page seems critical to understanding why the trees are correlated, including the fact that specific nodes/edges are found across multiple trees. The tree highlighting and variable edge width within the ARG helps to show this correlation but doesn't include the biological reasoning why. Maybe we move that paragraph up above this figure?

@kitchensjn
Copy link

With the latest commit to the visualizer, users can now control the size and symbol of the nodes. Here's your example from above with smaller nodes and square sample nodes.

import msprime
import demes
import tskit_arg_visualizer as viz


def whatis_example():
    demes_yml = """\
        description:
          Asymmetric migration between two extant demes.
        time_units: generations
        defaults:
          epoch:
            start_size: 5000
        demes:
          - name: Ancestral_population
            epochs:
              - end_time: 1000
          - name: A
            ancestors: [Ancestral_population]
          - name: B
            ancestors: [Ancestral_population]
            epochs:
              - start_size: 2000
                end_time: 500
              - start_size: 400
                end_size: 10000
        migrations:
          - source: A
            dest: B
            rate: 1e-4
        """
    graph = demes.loads(demes_yml)
    demography = msprime.Demography.from_demes(graph)
    # Choose seed so num_trees=3, tips are in same order,
    # first 2 trees are topologically different, and all trees have the same root
    seed = 12581
    ts = msprime.sim_ancestry(
        samples={"A": 2, "B": 3},
        demography=demography,
        recombination_rate=1e-8,
        sequence_length=1000,
        random_seed=seed)
    # Mutate
    # Choose seed to give 12 muts, last one above node 14
    seed = 1476
    return msprime.sim_mutations(ts, rate=1e-7, random_seed=seed)


ts = whatis_example()
arg = viz.D3ARG.from_ts(ts=ts)

labels = {}
for node in arg.nodes:
    if node["flag"]==1:
        labels[node["id"]] = node["label"]
    else:
        labels[node["id"]] = ""
arg.set_node_labels(labels=labels)

arg.draw(
    variable_edge_width=True,
    y_axis_scale="time",
    node_size=50,
    sample_node_symbol="d3.symbolSquare",
    sample_order=[0,2,3,4,8,9,5,6,7,1]
)
Example ARG with new symbols and node sizes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants