Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards #275

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Backwards #275

wants to merge 2 commits into from

Conversation

petrelharp
Copy link
Contributor

In working out an algorithm that wants to move back through time, it seemed helpful to do a simple explainer on iterating back in time - i.e., taking the haplotype view instead of the tree-by-tree view.

This is a draft of that. Suggestions for nicer python or fun examples welcome! So far it's not demonstrating anything that you couldn't do tree-by-tree.

at 300 generations ago, there were three extant genomes
from which the samples inherited, and the inherited segments are
as listed here.
Note that this does not mean that "node 2 was laive 300 generations ago"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "alive"

(clearly, as node 2 represetns an extant, sampled genome),
but rather that there are no other ancestral genomes recorded explicitly
in the tree sequence that lie on the path along
which node 2 has inherited it's genome.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "it's"

@hyanwong
Copy link
Member

hyanwong commented Jul 1, 2024

Nice. I like this. Note that there are a few examples of iterating up and down the graph at https://tskit.dev/tutorials/args.html#graph-traversal, but I don't actually do anything with the traversals, so your examples are better.

Also note that some of stuff might also link in to tskit-dev/tskit#2869, and there are some suggestions of things you might want to calculate there. One thing that is much easier to do compared to the tree-by-tree approach is to find all the descendant samples of a particular ancestral node (or alternatively, all the internal nodes that are ancestors of a particular sample).

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great, a really helpful addition. I've a few minor take-or-leave implementation suggestions.


```{code-cell} ipython3
for e in ts.edges():
t = ts.node(e.parent).time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use ts.nodes_time[e.parent] so that that this is more easily translatable to numba?

from which the samples inherited, and the inherited segments are
as listed here.
Note that this does not mean that "node 2 was laive 300 generations ago"
(clearly, as node 2 represetns an extant, sampled genome),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(clearly, as node 2 represetns an extant, sampled genome),
(clearly, as node 2 represents an extant, sampled genome),

Here is a data structure for a list of segments with labels:

```{code-cell} ipython3
class LabelSegmentList:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth considering inheriting from collections.abc.MutableSequence here, as this would give you all the dunder methods that you're implementing. I think these might be a bit scary to non-python people, and are a bit of a distraction from the main point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you'd need to inherit from list then to get the actual storage. Maybe that's OK?

Now, edges in the EdgeTable are sorted by parent time,
so if we iterate through the edges in order, we move back in time.
So, we can use this to see the state of the process at, say,
500 generations in the past:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, 500 should be 300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants