Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing code for RecombinationNodeMrcas figures #120

Closed
jeromekelleher opened this issue May 18, 2023 · 13 comments · May be fixed by jeromekelleher/sc2ts#163
Closed

Missing code for RecombinationNodeMrcas figures #120

jeromekelleher opened this issue May 18, 2023 · 13 comments · May be fixed by jeromekelleher/sc2ts#163

Comments

@jeromekelleher
Copy link
Owner

I can't find the code for where the fields (e.g. fwd_bck_parents_max_mut_dist) in the data frame used in the RecombinationNodeMrcas figure is defined @hyanwong.

Do you think you could pull this out into a notebook that would do the calculations and export to a CSV? We don't really want to be doing computation in the plots file (which shouldn't need to read in the actual trees files at all).

@hyanwong
Copy link
Collaborator

I started dumping the CSV creation code into make_csv_files.py - probably better than a notebook?

@hyanwong
Copy link
Collaborator

hyanwong commented May 18, 2023

I think the plots file might need to read in the trees file, if only to be able to plot the cophylogenies, right? So either we have a file which produces a smaller set of tree sequences for plotting the trees (so that the computation of simplifying the trees down is done there), or we do that simplification in the plots file.

The same goes for the subgraphs, which make up most of the code in plots.py, I think. How much of the computation of the subgraph plotting should happen in plots.py? If we do the computation somewhere else, we'd need to work out an export format for the subgraphs, which could be a real pain: not an easy thing to use a CSV file for.

@jeromekelleher
Copy link
Owner Author

We don't need to read the trees for this particular plot, and there's a lot of analysis in this that I (for one) don't fully follow. A notebook would be helpful.

@hyanwong
Copy link
Collaborator

hyanwong commented May 18, 2023

Happy to make a notebook for the MRCAs plot(s). In fact, I think I have one anyway, that I used to get it all working. I thought you meant that you didn't want the any of the code in plots.py to read the trees, which would be tricky, IMO.

@jeromekelleher
Copy link
Owner Author

We want a notebook that has analysis to produce (a) the CSV used and (b) all the numbers quoted in the text.

@hyanwong
Copy link
Collaborator

hyanwong commented May 18, 2023

I'll sort that. FWIW producing the CSV (in make_csv_files.py) is simply:

df = treeinfo.export_recombinant_breakpoints()
df.to_csv(f"data/breakpoints_{prefix}.csv")

and the numbers are output as extra info when running the plot-producing script with -v. It should be easy to add them to the notebook output too (personally I prefer them as text output when creating the plots, as I can easily get lost when looking through a notebook).

@jeromekelleher
Copy link
Owner Author

I don't think that produces fwd_bck_parents_max_mut_dist? I couldn't find code for it anyway.

@hyanwong
Copy link
Collaborator

hyanwong commented May 18, 2023

It was a PR to the sc2ts utils file that was merged a while ago, I think? I'll check.

@jeromekelleher
Copy link
Owner Author

and the numbers are output as extra info when running the plot-producing script with -v

Not all of them - there's some more numbers in the text with no source

@hyanwong
Copy link
Collaborator

It was a PR to the sc2ts utils file that was merged a while ago, I think? I'll check.

Not merged yet, That's why. Sorry! jeromekelleher/sc2ts#141

there's some more numbers in the text with no source

Ah, good spotting then. I'll check.

@hyanwong
Copy link
Collaborator

hyanwong commented May 19, 2023

We don't need to read the trees for this particular plot

Just looking at this again. I think it's helpful to have the trees in the plotting code, because we need to find the number of descendants of different types (e.g. BA.1) to label the top 4 MRCA nodes. This isn't something you can easily store in the CSV (since you don't want it for all nodes).

But I agree that we should be using a CSV for the point locations etc.

@jeromekelleher
Copy link
Owner Author

I think this is sorted now @hyanwong? Can we use data/wide_arg_recombinants.csv and delete the breakpoints... files, since they contain the same data? It looks like the only extra field needs the nodes_time field, which we could put into the plotting code since it needs to load the trees above?

We may as well delete the make_csvs file then, because all the rest are produced by exporting from notebooks (and I want to spend the time systematising this now).

@hyanwong
Copy link
Collaborator

Yep, fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants