Zarrv3 part two (#137)

Implements Zarr v3 with the [sharding storage transformer](https://web.archive.org/web/20230213221154/https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html). (Not the [sharding codec](https://zarr-specs.readthedocs.io/en/latest/v3/codecs/sharding-indexed/v1.0.html), which is WIP, but I'm keeping an eye on it.) Supersedes #101. Supersedes #125. Closes #76. Closes #111. ## Changes ### Added - Support for [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html). - Support for the [sharding storage transformer](https://web.archive.org/web/20230213221154/https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html) in Zarr v3. ## Testing Since napari doesn't yet support Zarr v3, we can use zarr-python with matplotlib to do a sanity check. Write out a Zarr v3 dataset using either `write-zarr-v3-raw` or `write-zarr-v3-compressed` in the `tests` folder, then run the following Python script: ```python import os import numpy as np import matplotlib.pyplot as plt # these MUST come before importing zarr os.environ["ZARR_V3_EXPERIMENTAL_API"] = "1" os.environ["ZARR_V3_SHARDING"] = "1" import zarr def plot_array(input_zarr): store3 = zarr.DirectoryStoreV3(input_zarr) z3 = zarr.open(store=store3, mode="r") for (k3, a3) in z3.arrays(): for i in range(a3.shape[0]): plt.imshow(a3[i, 0, :, :]) plt.show() if __name__ == "__main__": plot_array("C:/testing/acquire-driver-zarr-write-zarr-v3-compressed.zarr") # change this to point to the dataset you wrote out ``` For best results, you'll want to change the simulated camera used to something more intelligible, e.g., radial sin.
acquire-project · Nov 15, 2023 · 558efc9 · 558efc9
1 parent cf4044e
commit 558efc9
Show file tree

Hide file tree

Showing 38 changed files with 2,128 additions and 900 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,6 +9,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- Support for [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html).
+- Support for
+  the [sharding storage transformer](https://web.archive.org/web/20230213221154/https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html)
+  in Zarr v3.
 - Ship debug libs for C-Blosc on Linux and Mac.
 
 ### Changed
@@ -34,7 +38,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Changed
 
-- `ChunkWriter`s need to specify which multiscale layer they write to.
+- `ZarrV2Writer`s need to specify which multiscale layer they write to.
 - The Zarr writer now validates that image and tile shapes are set and compatible with each other before the first
   append.
 

diff --git a/README.md b/README.md
@@ -13,6 +13,9 @@ This is an Acquire Driver that supports chunked streaming to [zarr][].
 - **Zarr**
 - **ZarrBlosc1ZstdByteShuffle**
 - **ZarrBlosc1Lz4ByteShuffle**
+- **ZarrV3**
+- **ZarrV3Blosc1ZstdByteShuffle**
+- **ZarrV3Blosc1Lz4ByteShuffle**
 
 ## Using the Zarr storage device
 
@@ -24,67 +27,108 @@ Chunking is configured using `storage_properties_set_chunking_props()` when conf
 Multiscale storage can be enabled or disabled by calling `storage_properties_set_enable_multiscale()` when configuring
 the video stream.
 
+For the [Zarr v3] version of each device, you can use the `ZarrV3*` devices.
+**Note:** Zarr v3 is not [yet](https://github.com/ome/ngff/pull/206) supported
+by [ome-zarr-py](https://github.com/ome/ome-zarr-py), so you
+will not be able to read multiscale metadata from the resulting dataset.
+
+Zarr v3 *is* supported by [zarr-python](https://github.com/zarr-developers/zarr-python), but you will need to set two
+environment variables to work with it:
+
+```bash
+export ZARR_V3_EXPERIMENTAL_API=1
+export ZARR_V3_SHARDING=1
+```
+
+You can also set these variables in your Python script:
+
+```python
+import os
+
+# these MUST come before importing zarr
+os.environ["ZARR_V3_EXPERIMENTAL_API"] = "1"
+os.environ["ZARR_V3_SHARDING"] = "1"
+
+import zarr
+```
+
 ### Configuring chunking
 
 You can configure chunking by calling `storage_properties_set_chunking_props()` on your `StorageProperties` object
 _after_ calling `storage_properties_init()`.
-There are 4 parameters you can set to determine the chunk size, namely `tile_width`, `tile_height`, `tile_planes`,
-and `bytes_per_chunk`:
+There are 3 parameters you can set to determine the chunk size, namely `chunk_width`, `chunk_height`,
+and `chunk_planes`:
 
 ```c
 int
 storage_properties_set_chunking_props(struct StorageProperties* out,
-                                      uint32_t tile_width,
-                                      uint32_t tile_height,
-                                      uint32_t tile_planes,
-                                      uint64_t max_bytes_per_chunk)
+                                      uint32_t chunk_width,
+                                      uint32_t chunk_height,
+                                      uint32_t chunk_planes)
 ```
 
 | ![frames](https://github.com/aliddell/acquire-driver-zarr/assets/844464/3510d468-4751-4fa0-b2bf-0e29a5f3ea1c) |
-|:--:|
-| A collection of frames. |
+|:-------------------------------------------------------------------------------------------------------------:|
+|                                            A collection of frames.                                            |
 
 A _tile_ is a contiguous section, or region of interest, of a _frame_.
 
 | ![tiles](https://github.com/aliddell/acquire-driver-zarr/assets/844464/f8d16139-e0ac-44db-855f-2f5ef305c98b) |
-|:--:|
-| A collection of frames, divided into tiles. |
+|:------------------------------------------------------------------------------------------------------------:|
+|                                 A collection of frames, divided into tiles.                                  |
 
 A _chunk_ is nothing more than some number of stacked tiles from subsequent frames, with each tile in a chunk having
 the same ROI in its respective frame.
 
-|  ![chunks](https://github.com/aliddell/acquire-driver-zarr/assets/844464/653e4d82-363e-4e04-9a42-927b052fb6e7) |
-|:--:|
-| A collection of frames, divided into tiles. A single chunk has been highlighted in red. |
+| ![chunks](https://github.com/aliddell/acquire-driver-zarr/assets/844464/653e4d82-363e-4e04-9a42-927b052fb6e7) |
+|:-------------------------------------------------------------------------------------------------------------:|
+|            A collection of frames, divided into tiles. A single chunk has been highlighted in red.            |
 
-You can specify the width and height, in pixels, of each tile, and if your frame size has more than one plane, you can
-specify the number of planes you want per tile as well.
+You can specify the width and height, in pixels, of each tile.
 If any of these values are unset (equivalently, set to 0), or if they are set to a value larger than the frame size,
 the full value of the frame size along that dimension will be used instead.
 You should take care that the values you select won't result in tile sizes that are too small or too large for your
 application.
-
-The `max_bytes_per_chunk` parameter can be used to cap the size of a chunk.
-A minimum of 16 MiB is enforced, but no maximum, so if you are compressing you must ensure that you have sufficient
-memory for all your chunks to be stored in memory at once.
+You can also set the number of tile *planes* to concatenate into a chunk.
+If this value is unset (or set to 0), it will default to a prescribed minimum value of 32.
 
 #### Example
 
-Suppose your frame size is 1920 x 1080 x 1, with a pixel type of unsigned 8-bit integer.
-You can use a tile size of 640 x 360 x 1, which will divide your frame evenly into 9 tiles.
-You want chunk sizes of at most 64 MiB.
+Suppose your frame size is 1920 x 1080, with a pixel type of unsigned 8-bit integer.
+You can use a tile size of 640 x 360, which will divide your frame evenly into 9 tiles.
+You want chunk sizes of at most 32 MiB and this works out to 32 * 2^20 / (640 * 360) = 145.63555555555556, so you select
+145 chunk planes.
 You would configure your storage properties as follows:
 
 ```c
 storage_properties_set_chunking_props(&storage_props,
                                       640,
                                       360,
-                                      1,
-                                      64 * 1024 * 1024);
+                                      145);
 ```
 
-Note that 64 * 1024 * 1024 / (640 * 360) = 291.2711111111111, so each chunk will contain 291 tiles, or about 63.94 MiB
-raw, before compression.
+### Configuring sharding
+
+Configuring sharding is similar to configuring chunking.
+You can configure sharding by calling `storage_properties_set_sharding_props()` on your `StorageProperties` object
+_after_ calling `storage_properties_init()`.
+There are 3 parameters you can set to determine the shard size, namely `shard_width`, `shard_height`,
+and `shard_planes`.
+**Note:** whereas the unit for the width, height, and plane values when chunking is *pixels*, when sharding, the unit is
+*chunks*.
+So in the previous example, if you wanted combine all your chunks together into a single shard, you would set your shard
+properties like so:
+
+```c
+storage_properties_set_sharding_props(&storage_props,
+                                      3, // width: 1920 / 640
+                                      3, // height: 1080 / 360
+                                      1);
+```
+
+This would result in all 9 chunks being combined into a single shard.
+
+```c
 
 ### Compression
 
@@ -120,3 +164,5 @@ Then the sequence of levels will have dimensions 1920 x 1080, 960 x 540, 480 x 2
 [Blosc]: https://github.com/Blosc/c-blosc
 
 [Blosc docs]: https://www.blosc.org/
+
+[Zarr v3]: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html
diff --git a/examples/no-striping.cpp b/examples/no-striping.cpp
@@ -1,8 +1,8 @@
 /// @file
 /// @brief Generate a Zarr dataset with a single chunk using the simulated
-/// radial sine pattern with a u16 sample type. This example was used to generate
-/// data for a visual EXAMPLE of a fix for a striping artifact observed when
-/// writing to a Zarr dataset with multibyte samples.
+/// radial sine pattern with a u16 sample type. This example was used to
+/// generate data for a visual EXAMPLE of a fix for a striping artifact observed
+/// when writing to a Zarr dataset with multibyte samples.
 
 #include "device/hal/device.manager.h"
 #include "acquire.h"
@@ -88,7 +88,7 @@ reporter(int is_error,
 
 const static uint32_t frame_width = 1280;
 const static uint32_t frame_height = 720;
-const static uint32_t expected_frames_per_chunk = 30;
+const static uint32_t frames_per_chunk = 30;
 
 void
 acquire(AcquireRuntime* runtime, const char* filename)
@@ -120,16 +120,18 @@ acquire(AcquireRuntime* runtime, const char* filename)
                             sizeof(external_metadata),
                             sample_spacing_um);
 
-    storage_properties_set_chunking_props(
-      &props.video[0].storage.settings, frame_width, frame_height, 1, 64 << 20);
+    storage_properties_set_chunking_props(&props.video[0].storage.settings,
+                                          frame_width,
+                                          frame_height,
+                                          frames_per_chunk);
 
     props.video[0].camera.settings.binning = 1;
     props.video[0].camera.settings.pixel_type = SampleType_u16;
     props.video[0].camera.settings.shape = { .x = frame_width,
                                              .y = frame_height };
     // we may drop frames with lower exposure
     props.video[0].camera.settings.exposure_time_us = 2e5;
-    props.video[0].max_frame_count = expected_frames_per_chunk;
+    props.video[0].max_frame_count = frames_per_chunk;
 
     OK(acquire_configure(runtime, &props));
     OK(acquire_start(runtime));
@@ -164,13 +166,13 @@ main()
     ASSERT_STREQ("<u2", zarray["dtype"].get<std::string>());
 
     auto shape = zarray["shape"];
-    ASSERT_EQ(int, "%d", expected_frames_per_chunk, shape[0]);
+    ASSERT_EQ(int, "%d", frames_per_chunk, shape[0]);
     ASSERT_EQ(int, "%d", 1, shape[1]);
     ASSERT_EQ(int, "%d", frame_height, shape[2]);
     ASSERT_EQ(int, "%d", frame_width, shape[3]);
 
     auto chunks = zarray["chunks"];
-    ASSERT_EQ(int, "%d", expected_frames_per_chunk, chunks[0]);
+    ASSERT_EQ(int, "%d", frames_per_chunk, chunks[0]);
     ASSERT_EQ(int, "%d", 1, chunks[1]);
     ASSERT_EQ(int, "%d", frame_height, chunks[2]);
     ASSERT_EQ(int, "%d", frame_width, chunks[3]);

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
@@ -6,19 +6,22 @@ endif ()
 
 set(tgt acquire-driver-zarr)
 add_library(${tgt} MODULE
-        prelude.h
         common.hh
         common.cpp
         writers/writer.hh
         writers/writer.cpp
-        writers/chunk.writer.hh
-        writers/chunk.writer.cpp
+        writers/zarrv2.writer.hh
+        writers/zarrv2.writer.cpp
+        writers/zarrv3.writer.hh
+        writers/zarrv3.writer.cpp
         writers/blosc.compressor.hh
         writers/blosc.compressor.cpp
         zarr.hh
         zarr.cpp
         zarr.v2.hh
         zarr.v2.cpp
+        zarr.v3.hh
+        zarr.v3.cpp
         zarr.driver.c
 )
 target_enable_simd(${tgt})

diff --git a/src/README.md b/src/README.md
@@ -2,40 +2,44 @@
 
 ## Components
 
-### The `StorageInterface` class.
-
-Defines the interface that all Acquire `Storage` devices must implement, namely
-
-- `set`: Set the storage properties.
-- `get`: Get the storage properties.
-- `get_meta`: Get metadata for the storage properties.
-- `start`: Signal to the `Storage` device that it should start accepting frames.
-- `stop`: Signal to the `Storage` device that it should stop accepting frames.
-- `append`: Write incoming frames to the filesystem or other storage layer.
-- `reserve_image_shape`: Set the image shape for allocating chunk writers.
-
 ### The `Zarr` class
 
-An abstract class that implements the `StorageInterface`.
-Zarr is "[a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.](https://zarr.readthedocs.io/en/stable/index.html)"
+An abstract class that implements the `Storage` device interface.
+Zarr
+is "[a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.](https://zarr.readthedocs.io/en/stable/index.html)"
 
 ### The `ZarrV2` class
 
 Subclass of the `Zarr` class.
 Implements abstract methods for writer allocation and metadata.
-Specifically, `ZarrV2` allocates one writer of type `ChunkWriter` for each multiscale level-of-detail
+Specifically, `ZarrV2` allocates one writer of type `ZarrV2Writer` for each multiscale level-of-detail
 and writes metadata in the format specified by the [Zarr V2 spec](https://zarr.readthedocs.io/en/stable/spec/v2.html).
 
+### The `ZarrV3` class
+
+Subclass of the `Zarr` class.
+Implements abstract methods for writer allocation and metadata.
+Specifically, `ZarrV3` allocates one writer of type `ZarrV3Writer` for each multiscale level-of-detail
+and writes metadata in the format specified by
+the [Zarr V3 spec](https://zarr-specs.readthedocs.io/en/latest/specs.html).
+
 ### The `Writer` class
 
 An abstract class that writes frames to the filesystem or other storage layer.
 In general, frames are chunked and potentially compressed.
 The `Writer` handles chunking, chunk compression, and writing.
 
-### The `ChunkWriter` class
+### The `ZarrV2Writer` class
 
 Subclass of the `Writer` class.
 Implements abstract methods relating to writing and flushing chunk buffers.
+Chunk buffers, whether raw or compressed, are written to individual chunk files.
+
+### The `ZarrV3Writer` class
+
+Subclass of the `Writer` class.
+Implements abstract methods relating to writing, sharding, and flushing chunk buffers.
+Chunk buffers, whether raw or compressed, are concatenated into shards, which are written out to individual shard files.
 
 ### The `BloscCompressionParams` struct
 

diff --git a/src/acquire-core-libs b/src/acquire-core-libs
+11 −0		CHANGELOG.md
+4 −2		src/acquire-core-platform/linux/platform.c
+3 −1		src/acquire-core-platform/osx/platform.c
+2 −2		src/acquire-core-platform/win32/platform.c
+52 −21		src/acquire-device-properties/device/props/storage.c
+47 −30		src/acquire-device-properties/device/props/storage.h
+2 −0		tests/unit-tests.cpp