Blogpost idea: how to generate multiscale image arrays #141

GenevieveBuckley · 2022-07-27T13:35:25Z

This PR is currently in progress, but could be merged soon (for some loose value of "soon", I don't have a good idea of when) ome/ome-zarr-py#192

When it is done, I think it might be nice to have a blogpost about how to generate a multiscale image array and save it to disk, etc.

This is something that surprisingly doesn't seem to have a single, obvious, best way to do it (see discussion ome/ome-zarr-py#215). So when there is a convenience function available, it would be good to highlight that with a blogpost.

Jacob, feel free to nudge me in a few months about this, if you like. (That may or may not work, I can't say for sure I'll be available to do more about it then, but it's worth a try)

TomAugspurger · 2022-07-27T13:46:11Z

https://github.com/carbonplan/ndpyramid, from the geospatial context, might be helpful / worth linking to here. It's usage of Dask is pretty hidden behind xarray (but see carbonplan/ndpyramid#10 for a more direct Dask integration).

GenevieveBuckley · 2022-07-28T01:23:07Z

Ooh, very cool

I actually hadn't heard about ndpyramid. I'm going to have to try that one out, I might actually end up using it all the time. Thanks Tom!

GenevieveBuckley · 2022-07-28T01:27:54Z

@sofroniewn & @jni have you two seen or used ndpyramid? It looks super useful, especially in a napari context

sofroniewn · 2022-07-28T02:04:19Z

no - lol @freeman-lab gotta tell me these things! @joshmoore, have you seen this?

joshmoore · 2022-07-28T02:09:11Z

lol @freeman-lab gotta tell me these things!

😆

@joshmoore, have you seen this?

I must admit, yes. But I must also admit to losing track of it. I think @thewtex has one as well as well as @aisenbarth's https://github.com/aeisenbarth/ngff-writer (which flowed into ome/ome-zarr-py#192) . Big 👍🏽 for doing what we can to work together on faster, better, slicker libraries.

jakirkham · 2022-07-29T00:11:48Z

There was also a lot of discussion in issue ( pydata/xarray#4118 ) about to handle this use case better. Josh likely has a better handle on where things are there than I.

jakirkham · 2022-08-26T16:52:22Z

This PR is currently in progress, but could be merged soon (for some loose value of "soon", I don't have a good idea of when) ome/ome-zarr-py#192

FWIW this just got merged! 🥳

will-moore · 2022-09-09T11:23:38Z

Support for dask writing to OME-NGFF (ome/ome-zarr-py#192) is now released in ome-zarr-py 0.6.0

chrisroat · 2023-04-06T05:17:29Z

There may be some code snippets that work well for smaller data, but when doing large datasets, using your general purpose cluster to do downsampling might be inefficient. It also adds additional tasks to what may be a clean analysis workflow.

For a large pipeline that is processing and dumping a lot of data, it can be cleaner and more efficient to split out the downsampling work. The dask processing cluster can store a dask array in a tensorstore dataset, using the precomputed neuroglancer driver (https://google.github.io/tensorstore/driver/neuroglancer_precomputed/index.html). Separate, dedicated resources can be specified for out-of-band downsampling: a task queue that feeds an igneous cluster that can have CPU/memory/IO tuned efficiently for the dedicated task.

Igneous is very well-developed and maintained. I currently use it locally on 10-100GB datasets regularly, and it always works smoothly -- even supporting sharded neuroglancer formats.

Shards are different than dask blocks - sharding allows much more efficient usage because a shard is written as a large file (allowing for smaller file counts and more efficient data xfer), but the much tinier chunks for visualization are stored within the shard in a format nice for HTTP range requests.

GenevieveBuckley · 2023-04-19T05:14:49Z

Thanks @chrisroat
Do you have an example of this I can look at? I'm not very familiar with igneous (mostly because I don't usually work with neuro datasets)

sofroniewn mentioned this issue Jul 28, 2022

Pass xarray to napari-gui for autolabeling sliders napari/napari#14

Open

jakirkham mentioned this issue Jul 29, 2022

Feature Request: Hierarchical storage and processing in xarray pydata/xarray#4118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blogpost idea: how to generate multiscale image arrays #141

Blogpost idea: how to generate multiscale image arrays #141

GenevieveBuckley commented Jul 27, 2022

TomAugspurger commented Jul 27, 2022

GenevieveBuckley commented Jul 28, 2022

GenevieveBuckley commented Jul 28, 2022

sofroniewn commented Jul 28, 2022 •

edited

Loading

joshmoore commented Jul 28, 2022

jakirkham commented Jul 29, 2022

jakirkham commented Aug 26, 2022

will-moore commented Sep 9, 2022

chrisroat commented Apr 6, 2023 •

edited

Loading

GenevieveBuckley commented Apr 19, 2023

Blogpost idea: how to generate multiscale image arrays #141

Blogpost idea: how to generate multiscale image arrays #141

Comments

GenevieveBuckley commented Jul 27, 2022

TomAugspurger commented Jul 27, 2022

GenevieveBuckley commented Jul 28, 2022

GenevieveBuckley commented Jul 28, 2022

sofroniewn commented Jul 28, 2022 • edited Loading

joshmoore commented Jul 28, 2022

jakirkham commented Jul 29, 2022

jakirkham commented Aug 26, 2022

will-moore commented Sep 9, 2022

chrisroat commented Apr 6, 2023 • edited Loading

GenevieveBuckley commented Apr 19, 2023

sofroniewn commented Jul 28, 2022 •

edited

Loading

chrisroat commented Apr 6, 2023 •

edited

Loading