Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new notebook #28

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Add new notebook #28

wants to merge 5 commits into from

Conversation

Phlya
Copy link
Member

@Phlya Phlya commented Nov 7, 2023

Started a new notebook to show how to manipulate cooler files based on the rearrange_cooler example notebook.

Also fixed the make_chromarms issue. Now fixed in bioframe, wait for its release to merge this.

Note that after rerunning the contacts_vs_distance notebook one of the final plots is ugly and I don't understand why!

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Phlya Phlya requested a review from agalitsyna November 7, 2023 16:15
@Phlya Phlya changed the title Add new notebook, fix make_chromarms Add new notebook Nov 8, 2023
@Phlya
Copy link
Member Author

Phlya commented Nov 10, 2023

https://github.com/open2c/open2c_examples/blob/cooler-manipulation-notebook/cooler_manipulation.ipynb
Here is the link to the new notebook, to make it easier btw

cooler_manipulation.ipynb Show resolved Hide resolved
cooler_manipulation.ipynb Show resolved Hide resolved
@@ -0,0 +1,1524 @@
{
Copy link
Member

@gfudenberg gfudenberg Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might put this specific example later, because this is not the recommended approach-- it would be easier/better just to re-balance & overwrite the divisive weights.

Also this title can be changed to something more specifici- e.g. How to modify correction weights stored in bins

Also, the current example is a little confusing, as it takes divisive weights, makes them multiplicative & then discards them...

A nice example might be to take the weights of one cooler & apply them to another...


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sometimes people want to ensure that they use the same weights for visualization in juicebox and analysis with cooltools, for example, or to compare analyses of the two... And it's certainly faster than rebalancing, which might make a difference sometimes.

I just wanted to show how to remove a column, as an example... Idk I think it might be useful sometimes? Also tbh I just wanted the file to remain in the original state just in case, so it doesn't make something in some other notebook look weird.

Copy link
Member

@gfudenberg gfudenberg Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two concepts to introduce:

  • how mcool resolutions are stored in hdf5 (from cooler docs, which could be. linked?):
    The current standard for Hi-C coolers is to name multi-resolution coolers under .mcool extension, and store differrent resolutions in an HDF5 group resolutions, as shown above.)
  • that weights are stored in the bintable (just needs to be mentioned, e.g. vectors used for correcting coolers are typically stored as columns in the cooler bintable... different tools sometimes generate weights in similar yet slightly incompatible formats. For example, cooler and cooltools assumes weights from balancing are multiplicative, whereas juicer and the .hic format assume divisive weights for balancing (also termed KR normalization). The utility hic2cool [link] currently converts some of the data but does not invert weights. However, here we show that just by accessing the bin table directly we can manually invert these weights. )

could modify code so that the resolution grabbed from the hdf5 file is the same as the cooler being considered elsewhere (e.g. use resolution = clr.bin-size which is then passed as a key to f)

cooler_manipulation.ipynb Show resolved Hide resolved
cooler_manipulation.ipynb Show resolved Hide resolved
@@ -0,0 +1,1524 @@
{
Copy link
Member

@gfudenberg gfudenberg Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.        f['resolutions']['1000000']['bins']['weight_inverted'] = 1/f['resolutions']['1000000']['bins']['weight'][:]

should the reading mode be described?


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to line above for readability, open hdf5 file in r+ mode which allows read/write but the file must exist

@@ -0,0 +1,1524 @@
{
Copy link
Member

@gfudenberg gfudenberg Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe be more specific with what is done: Here we show an example where two chromosomes are split into 4 arms, re-orderd, and inverted.


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if before/after could be put in side-by-side subplot and have titles (e.g. original, re-orderd) this would be a bit more clear

cooler_manipulation.ipynb Show resolved Hide resolved
@@ -0,0 +1,1524 @@
{
Copy link
Member

@gfudenberg gfudenberg Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    clr_new.bins()[:]

not sure I understand this explanation...


Reply via ReviewNB

Copy link
Member

@gfudenberg gfudenberg Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# This is because chromosome lengths are unlikely to be a multiple of binsize,
# and the last bin of a chromosome is often shorter.
# If a short bin that was formerly at the end of a chromosome is reordered to
# the start or middle of a chromosome, we observe bins positions that
# are not multiples of the binsize.```

cooler_manipulation.ipynb Show resolved Hide resolved
@Phlya
Copy link
Member Author

Phlya commented Nov 14, 2023

@gfudenberg is it better now?

@Phlya
Copy link
Member Author

Phlya commented Jan 25, 2024

@gfudenberg could you have a look at this again please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants