how do we plot cooltools outputs? One idea: cool reports #285

golobor · 2019-06-28T20:55:20Z

golobor
Jun 28, 2019
Maintainer

I propose to create a group of cooltools CLI commands called "report" (the name is up to debate). This group will host various "secondary" CLI commands that take the results of "primary" cooltools and create various plots, reports and summary statistics.

Examples:

plot saddle plots
plot scalings
plot compartmentalization vs distance (by @jnuebler )
plot insulation score statistics (that's the one I'm going to implement first)

Pros:

unleashes the contribution of codes for "reporting". The practice of biological data analysis requires lots of plots and summaries; currently, cooltools hosts very little such code, partially, b/c it's not clear where to put it.
separates "conservative" and "experimental" codes. Plotting is inherently messy, poorly defined and user-dependent. On the other hand, the code for calculation of "primary" scores must be stable (ideally, not change at all) and well-defined. Mixing the two creates various opportunities for confusion.
simplifies API. Again, since reporting/plotting is so flexible and poorly defined, it naturally requires lots of extra arguments (e.g. see the CLI for cooltools compute-saddle). Separating CLI for calculations and plotting is a very natural way to simplify both.

Cons:

?..

@nvictus @sergpolly @gfudenberg @Phlya @itsameercat @mimakaev

nvictus · 2019-06-28T22:44:39Z

nvictus
Jun 28, 2019
Maintainer

Cons:

More complexity. Changing score outputs might break reporting code.

Cleanly separating calculation from plotting code seems somewhat orthogonal to having a set of "report" commands. In any case, the way I would see such "reports" evolving is:

A detailed notebook going through the summary analysis step by step
Refactoring into summarization and plotting functions
Refactoring the notebook to use the new functions, explore parameterization. As a bonus, your method gets documented.
If a robust pattern emerges, it can then be exposed via the CLI. When the CLI is too inflexible (you can never specify enough plotting options), people can dig into the notebook and mess around.

saddle plots sort of has this...

0 replies

golobor · 2019-06-28T23:38:07Z

golobor
Jun 28, 2019
Maintainer Author

1. Which score outputs do you expect changing? Very few tools do such "reporting" now. 2. Plotting and reporting are not that separable, because two plots already make a report. 3. The suggested pathway via Jupyter is nice, but doesn't scale to batch processing of multiple datasets, which is the default use case.

…

On Fri, Jun 28, 2019, 18:44 Nezar Abdennur ***@***.***> wrote: Cons: - More complexity. Changing score outputs might break reporting code. Cleanly separating calculation from plotting code seems somewhat orthogonal to having a set of "report" commands. In any case, the way I would see such "reports" evolving is: 1. A detailed notebook going through the summary analysis step by step 2. Refactoring into summarization and plotting functions 3. Refactoring the notebook to use the new functions, explore parameterization. As a bonus, your method gets documented. 4. If a robust pattern emerges, it can then be exposed via the CLI. When the CLI is too inflexible (you can never specify enough plotting options), people can dig into the notebook and mess around. saddle plots sort of has this... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#88>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAG64CR4WWNIRRQWQTNJATTP42H5PANCNFSM4H4IOOTA> .

0 replies

Phlya · 2019-06-30T13:22:48Z

Phlya
Jun 30, 2019
Maintainer

Apart from saddles, what sort of plots are you thinking of? E.g., what do you want to implement for IS?

It's more for pairtools than cooltools, but I think the most needed thing is some in-built simple plotting of stats (after distiller). Maybe it should be integrated with MultiQC? https://multiqc.info/ Or just a matplotlib script, smth like I shared with you a while back. E.g. HiCPro has iplots in output http://nservant.github.io/HiC-Pro/RESULTS.html, but we could include more info like scaling, orientation by distance etc - it's all in the stats files already.

0 replies

golobor · 2019-07-01T15:04:30Z

golobor
Jul 1, 2019
Maintainer Author

Ilya,
re: IS, I've been working on an automated criterion for IS boundary selection, per request of 4DN DCIC. I think I arrived at a heuristic that may work, but it involves a Gaussian Mixture Model, which don't always fit well. For that reason, I want to have a simple way to plot a diagnostics report for each sample.

Re: your suggestion, I totally agree (time is always the limitation :( ). DCIC has produced something along these lines: https://github.com/4dn-dcic/pairsqc . MultiQC seems very promising too!!

Regarding different ways to produce these plots, there is one potential approach based on @nvictus ' s suggestion. Specifically, we could design a collection of Jupyter notebooks for stats reporting and then run them against arbitrary input data using jupyter nbconver --exectute (potentially, feeding the paths to the data via https://github.com/nteract/papermill). The advantage of such approach is that jupyter notebooks are very flexible, very easy to develop (less overhead than a CLI), we can have as many of them as we want w/o polluting the CLI interface. It also does not require introducing (and learning!) exotic dependencies and can produce output in a variety of formats.

0 replies

Phlya · 2019-07-01T15:11:39Z

Phlya
Jul 1, 2019
Maintainer

re IS: That sounds cool!

re pairsqc - have you tried installing it? I have, and didn't manage. I honestly think the first thing to do is just use output of pairtools stats, it has all the information, so no need to use the pairs directly - in distiller stats is performed anyway.

Using notebooks sounds awesome, at some point important ones can be converted into a proper CLI tool, nice.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how do we plot cooltools outputs? One idea: cool reports #285

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

how do we plot cooltools outputs? One idea: cool reports #285

golobor Jun 28, 2019 Maintainer

Replies: 5 comments

nvictus Jun 28, 2019 Maintainer

golobor Jun 28, 2019 Maintainer Author

Phlya Jun 30, 2019 Maintainer

golobor Jul 1, 2019 Maintainer Author

Phlya Jul 1, 2019 Maintainer

golobor
Jun 28, 2019
Maintainer

nvictus
Jun 28, 2019
Maintainer

golobor
Jun 28, 2019
Maintainer Author

Phlya
Jun 30, 2019
Maintainer

golobor
Jul 1, 2019
Maintainer Author

Phlya
Jul 1, 2019
Maintainer