-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new notebook #28
base: master
Are you sure you want to change the base?
Add new notebook #28
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
https://github.com/open2c/open2c_examples/blob/cooler-manipulation-notebook/cooler_manipulation.ipynb |
@@ -0,0 +1,1524 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might put this specific example later, because this is not the recommended approach-- it would be easier/better just to re-balance & overwrite the divisive weights.
Also this title can be changed to something more specifici- e.g. How to modify correction weights stored in bins
Also, the current example is a little confusing, as it takes divisive weights, makes them multiplicative & then discards them...
A nice example might be to take the weights of one cooler & apply them to another...
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sometimes people want to ensure that they use the same weights for visualization in juicebox and analysis with cooltools, for example, or to compare analyses of the two... And it's certainly faster than rebalancing, which might make a difference sometimes.
I just wanted to show how to remove a column, as an example... Idk I think it might be useful sometimes? Also tbh I just wanted the file to remain in the original state just in case, so it doesn't make something in some other notebook look weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two concepts to introduce:
- how mcool resolutions are stored in hdf5 (from cooler docs, which could be. linked?):
The current standard for Hi-C coolers is to name multi-resolution coolers under .mcool extension, and store differrent resolutions in an HDF5 group resolutions, as shown above.) - that weights are stored in the bintable (just needs to be mentioned, e.g. vectors used for correcting coolers are typically stored as columns in the cooler bintable... different tools sometimes generate weights in similar yet slightly incompatible formats. For example, cooler and cooltools assumes weights from balancing are multiplicative, whereas juicer and the .hic format assume divisive weights for balancing (also termed KR normalization). The utility hic2cool [link] currently converts some of the data but does not invert weights. However, here we show that just by accessing the bin table directly we can manually invert these weights. )
could modify code so that the resolution grabbed from the hdf5 file is the same as the cooler being considered elsewhere (e.g. use resolution = clr.bin-size
which is then passed as a key to f
)
@@ -0,0 +1,1524 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #2. f['resolutions']['1000000']['bins']['weight_inverted'] = 1/f['resolutions']['1000000']['bins']['weight'][:]
should the reading mode be described?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to line above for readability, open hdf5 file in r+ mode which allows read/write but the file must exist
@@ -0,0 +1,1524 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe be more specific with what is done: Here we show an example where two chromosomes are split into 4 arms, re-orderd, and inverted.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if before/after could be put in side-by-side subplot and have titles (e.g. original, re-orderd) this would be a bit more clear
@@ -0,0 +1,1524 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# This is because chromosome lengths are unlikely to be a multiple of binsize,
# and the last bin of a chromosome is often shorter.
# If a short bin that was formerly at the end of a chromosome is reordered to
# the start or middle of a chromosome, we observe bins positions that
# are not multiples of the binsize.```
@gfudenberg is it better now? |
@gfudenberg could you have a look at this again please? |
Started a new notebook to show how to manipulate cooler files based on the rearrange_cooler example notebook.
Also fixed the make_chromarms issue.Now fixed in bioframe, wait for its release to merge this.Note that after rerunning the contacts_vs_distance notebook one of the final plots is ugly and I don't understand why!