Skip to content

Commit

Permalink
Zarrv3 part two (#137)
Browse files Browse the repository at this point in the history
Implements Zarr v3 with the [sharding storage
transformer](https://web.archive.org/web/20230213221154/https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html).
(Not the [sharding
codec](https://zarr-specs.readthedocs.io/en/latest/v3/codecs/sharding-indexed/v1.0.html),
which is WIP, but I'm keeping an eye on it.)

Supersedes #101. Supersedes #125.
Closes #76. Closes #111.

## Changes

### Added

- Support for [Zarr
v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html).
- Support for the [sharding storage
transformer](https://web.archive.org/web/20230213221154/https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html)
in Zarr v3.

## Testing

Since napari doesn't yet support Zarr v3, we can use zarr-python with
matplotlib to do a sanity check. Write out a Zarr v3 dataset using
either `write-zarr-v3-raw` or `write-zarr-v3-compressed` in the `tests`
folder, then run the following Python script:

```python
import os
import numpy as np
import matplotlib.pyplot as plt

# these MUST come before importing zarr
os.environ["ZARR_V3_EXPERIMENTAL_API"] = "1"
os.environ["ZARR_V3_SHARDING"] = "1"

import zarr

def plot_array(input_zarr):
    store3 = zarr.DirectoryStoreV3(input_zarr)
    z3 = zarr.open(store=store3, mode="r")

    for (k3, a3) in z3.arrays():
        for i in range(a3.shape[0]):
            plt.imshow(a3[i, 0, :, :])
            plt.show()

if __name__ == "__main__":
    plot_array("C:/testing/acquire-driver-zarr-write-zarr-v3-compressed.zarr") # change this to point to the dataset you wrote out
```

For best results, you'll want to change the simulated camera used to
something more intelligible, e.g., radial sin.
  • Loading branch information
aliddell authored Nov 15, 2023
1 parent cf4044e commit 558efc9
Show file tree
Hide file tree
Showing 38 changed files with 2,128 additions and 900 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Support for [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html).
- Support for
the [sharding storage transformer](https://web.archive.org/web/20230213221154/https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html)
in Zarr v3.
- Ship debug libs for C-Blosc on Linux and Mac.

### Changed
Expand All @@ -34,7 +38,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- `ChunkWriter`s need to specify which multiscale layer they write to.
- `ZarrV2Writer`s need to specify which multiscale layer they write to.
- The Zarr writer now validates that image and tile shapes are set and compatible with each other before the first
append.

Expand Down
98 changes: 72 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ This is an Acquire Driver that supports chunked streaming to [zarr][].
- **Zarr**
- **ZarrBlosc1ZstdByteShuffle**
- **ZarrBlosc1Lz4ByteShuffle**
- **ZarrV3**
- **ZarrV3Blosc1ZstdByteShuffle**
- **ZarrV3Blosc1Lz4ByteShuffle**

## Using the Zarr storage device

Expand All @@ -24,67 +27,108 @@ Chunking is configured using `storage_properties_set_chunking_props()` when conf
Multiscale storage can be enabled or disabled by calling `storage_properties_set_enable_multiscale()` when configuring
the video stream.

For the [Zarr v3] version of each device, you can use the `ZarrV3*` devices.
**Note:** Zarr v3 is not [yet](https://github.com/ome/ngff/pull/206) supported
by [ome-zarr-py](https://github.com/ome/ome-zarr-py), so you
will not be able to read multiscale metadata from the resulting dataset.

Zarr v3 *is* supported by [zarr-python](https://github.com/zarr-developers/zarr-python), but you will need to set two
environment variables to work with it:

```bash
export ZARR_V3_EXPERIMENTAL_API=1
export ZARR_V3_SHARDING=1
```

You can also set these variables in your Python script:

```python
import os

# these MUST come before importing zarr
os.environ["ZARR_V3_EXPERIMENTAL_API"] = "1"
os.environ["ZARR_V3_SHARDING"] = "1"

import zarr
```

### Configuring chunking

You can configure chunking by calling `storage_properties_set_chunking_props()` on your `StorageProperties` object
_after_ calling `storage_properties_init()`.
There are 4 parameters you can set to determine the chunk size, namely `tile_width`, `tile_height`, `tile_planes`,
and `bytes_per_chunk`:
There are 3 parameters you can set to determine the chunk size, namely `chunk_width`, `chunk_height`,
and `chunk_planes`:

```c
int
storage_properties_set_chunking_props(struct StorageProperties* out,
uint32_t tile_width,
uint32_t tile_height,
uint32_t tile_planes,
uint64_t max_bytes_per_chunk)
uint32_t chunk_width,
uint32_t chunk_height,
uint32_t chunk_planes)
```
| ![frames](https://github.com/aliddell/acquire-driver-zarr/assets/844464/3510d468-4751-4fa0-b2bf-0e29a5f3ea1c) |
|:--:|
| A collection of frames. |
|:-------------------------------------------------------------------------------------------------------------:|
| A collection of frames. |
A _tile_ is a contiguous section, or region of interest, of a _frame_.
| ![tiles](https://github.com/aliddell/acquire-driver-zarr/assets/844464/f8d16139-e0ac-44db-855f-2f5ef305c98b) |
|:--:|
| A collection of frames, divided into tiles. |
|:------------------------------------------------------------------------------------------------------------:|
| A collection of frames, divided into tiles. |
A _chunk_ is nothing more than some number of stacked tiles from subsequent frames, with each tile in a chunk having
the same ROI in its respective frame.
| ![chunks](https://github.com/aliddell/acquire-driver-zarr/assets/844464/653e4d82-363e-4e04-9a42-927b052fb6e7) |
|:--:|
| A collection of frames, divided into tiles. A single chunk has been highlighted in red. |
| ![chunks](https://github.com/aliddell/acquire-driver-zarr/assets/844464/653e4d82-363e-4e04-9a42-927b052fb6e7) |
|:-------------------------------------------------------------------------------------------------------------:|
| A collection of frames, divided into tiles. A single chunk has been highlighted in red. |
You can specify the width and height, in pixels, of each tile, and if your frame size has more than one plane, you can
specify the number of planes you want per tile as well.
You can specify the width and height, in pixels, of each tile.
If any of these values are unset (equivalently, set to 0), or if they are set to a value larger than the frame size,
the full value of the frame size along that dimension will be used instead.
You should take care that the values you select won't result in tile sizes that are too small or too large for your
application.
The `max_bytes_per_chunk` parameter can be used to cap the size of a chunk.
A minimum of 16 MiB is enforced, but no maximum, so if you are compressing you must ensure that you have sufficient
memory for all your chunks to be stored in memory at once.
You can also set the number of tile *planes* to concatenate into a chunk.
If this value is unset (or set to 0), it will default to a prescribed minimum value of 32.
#### Example
Suppose your frame size is 1920 x 1080 x 1, with a pixel type of unsigned 8-bit integer.
You can use a tile size of 640 x 360 x 1, which will divide your frame evenly into 9 tiles.
You want chunk sizes of at most 64 MiB.
Suppose your frame size is 1920 x 1080, with a pixel type of unsigned 8-bit integer.
You can use a tile size of 640 x 360, which will divide your frame evenly into 9 tiles.
You want chunk sizes of at most 32 MiB and this works out to 32 * 2^20 / (640 * 360) = 145.63555555555556, so you select
145 chunk planes.
You would configure your storage properties as follows:
```c
storage_properties_set_chunking_props(&storage_props,
640,
360,
1,
64 * 1024 * 1024);
145);
```

Note that 64 * 1024 * 1024 / (640 * 360) = 291.2711111111111, so each chunk will contain 291 tiles, or about 63.94 MiB
raw, before compression.
### Configuring sharding

Configuring sharding is similar to configuring chunking.
You can configure sharding by calling `storage_properties_set_sharding_props()` on your `StorageProperties` object
_after_ calling `storage_properties_init()`.
There are 3 parameters you can set to determine the shard size, namely `shard_width`, `shard_height`,
and `shard_planes`.
**Note:** whereas the unit for the width, height, and plane values when chunking is *pixels*, when sharding, the unit is
*chunks*.
So in the previous example, if you wanted combine all your chunks together into a single shard, you would set your shard
properties like so:

```c
storage_properties_set_sharding_props(&storage_props,
3, // width: 1920 / 640
3, // height: 1080 / 360
1);
```
This would result in all 9 chunks being combined into a single shard.
```c
### Compression
Expand Down Expand Up @@ -120,3 +164,5 @@ Then the sequence of levels will have dimensions 1920 x 1080, 960 x 540, 480 x 2
[Blosc]: https://github.com/Blosc/c-blosc

[Blosc docs]: https://www.blosc.org/

[Zarr v3]: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html
20 changes: 11 additions & 9 deletions examples/no-striping.cpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
/// @file
/// @brief Generate a Zarr dataset with a single chunk using the simulated
/// radial sine pattern with a u16 sample type. This example was used to generate
/// data for a visual EXAMPLE of a fix for a striping artifact observed when
/// writing to a Zarr dataset with multibyte samples.
/// radial sine pattern with a u16 sample type. This example was used to
/// generate data for a visual EXAMPLE of a fix for a striping artifact observed
/// when writing to a Zarr dataset with multibyte samples.

#include "device/hal/device.manager.h"
#include "acquire.h"
Expand Down Expand Up @@ -88,7 +88,7 @@ reporter(int is_error,

const static uint32_t frame_width = 1280;
const static uint32_t frame_height = 720;
const static uint32_t expected_frames_per_chunk = 30;
const static uint32_t frames_per_chunk = 30;

void
acquire(AcquireRuntime* runtime, const char* filename)
Expand Down Expand Up @@ -120,16 +120,18 @@ acquire(AcquireRuntime* runtime, const char* filename)
sizeof(external_metadata),
sample_spacing_um);

storage_properties_set_chunking_props(
&props.video[0].storage.settings, frame_width, frame_height, 1, 64 << 20);
storage_properties_set_chunking_props(&props.video[0].storage.settings,
frame_width,
frame_height,
frames_per_chunk);

props.video[0].camera.settings.binning = 1;
props.video[0].camera.settings.pixel_type = SampleType_u16;
props.video[0].camera.settings.shape = { .x = frame_width,
.y = frame_height };
// we may drop frames with lower exposure
props.video[0].camera.settings.exposure_time_us = 2e5;
props.video[0].max_frame_count = expected_frames_per_chunk;
props.video[0].max_frame_count = frames_per_chunk;

OK(acquire_configure(runtime, &props));
OK(acquire_start(runtime));
Expand Down Expand Up @@ -164,13 +166,13 @@ main()
ASSERT_STREQ("<u2", zarray["dtype"].get<std::string>());

auto shape = zarray["shape"];
ASSERT_EQ(int, "%d", expected_frames_per_chunk, shape[0]);
ASSERT_EQ(int, "%d", frames_per_chunk, shape[0]);
ASSERT_EQ(int, "%d", 1, shape[1]);
ASSERT_EQ(int, "%d", frame_height, shape[2]);
ASSERT_EQ(int, "%d", frame_width, shape[3]);

auto chunks = zarray["chunks"];
ASSERT_EQ(int, "%d", expected_frames_per_chunk, chunks[0]);
ASSERT_EQ(int, "%d", frames_per_chunk, chunks[0]);
ASSERT_EQ(int, "%d", 1, chunks[1]);
ASSERT_EQ(int, "%d", frame_height, chunks[2]);
ASSERT_EQ(int, "%d", frame_width, chunks[3]);
Expand Down
9 changes: 6 additions & 3 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,22 @@ endif ()

set(tgt acquire-driver-zarr)
add_library(${tgt} MODULE
prelude.h
common.hh
common.cpp
writers/writer.hh
writers/writer.cpp
writers/chunk.writer.hh
writers/chunk.writer.cpp
writers/zarrv2.writer.hh
writers/zarrv2.writer.cpp
writers/zarrv3.writer.hh
writers/zarrv3.writer.cpp
writers/blosc.compressor.hh
writers/blosc.compressor.cpp
zarr.hh
zarr.cpp
zarr.v2.hh
zarr.v2.cpp
zarr.v3.hh
zarr.v3.cpp
zarr.driver.c
)
target_enable_simd(${tgt})
Expand Down
36 changes: 20 additions & 16 deletions src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,44 @@

## Components

### The `StorageInterface` class.

Defines the interface that all Acquire `Storage` devices must implement, namely

- `set`: Set the storage properties.
- `get`: Get the storage properties.
- `get_meta`: Get metadata for the storage properties.
- `start`: Signal to the `Storage` device that it should start accepting frames.
- `stop`: Signal to the `Storage` device that it should stop accepting frames.
- `append`: Write incoming frames to the filesystem or other storage layer.
- `reserve_image_shape`: Set the image shape for allocating chunk writers.

### The `Zarr` class

An abstract class that implements the `StorageInterface`.
Zarr is "[a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.](https://zarr.readthedocs.io/en/stable/index.html)"
An abstract class that implements the `Storage` device interface.
Zarr
is "[a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.](https://zarr.readthedocs.io/en/stable/index.html)"

### The `ZarrV2` class

Subclass of the `Zarr` class.
Implements abstract methods for writer allocation and metadata.
Specifically, `ZarrV2` allocates one writer of type `ChunkWriter` for each multiscale level-of-detail
Specifically, `ZarrV2` allocates one writer of type `ZarrV2Writer` for each multiscale level-of-detail
and writes metadata in the format specified by the [Zarr V2 spec](https://zarr.readthedocs.io/en/stable/spec/v2.html).

### The `ZarrV3` class

Subclass of the `Zarr` class.
Implements abstract methods for writer allocation and metadata.
Specifically, `ZarrV3` allocates one writer of type `ZarrV3Writer` for each multiscale level-of-detail
and writes metadata in the format specified by
the [Zarr V3 spec](https://zarr-specs.readthedocs.io/en/latest/specs.html).

### The `Writer` class

An abstract class that writes frames to the filesystem or other storage layer.
In general, frames are chunked and potentially compressed.
The `Writer` handles chunking, chunk compression, and writing.

### The `ChunkWriter` class
### The `ZarrV2Writer` class

Subclass of the `Writer` class.
Implements abstract methods relating to writing and flushing chunk buffers.
Chunk buffers, whether raw or compressed, are written to individual chunk files.

### The `ZarrV3Writer` class

Subclass of the `Writer` class.
Implements abstract methods relating to writing, sharding, and flushing chunk buffers.
Chunk buffers, whether raw or compressed, are concatenated into shards, which are written out to individual shard files.

### The `BloscCompressionParams` struct

Expand Down
Loading

0 comments on commit 558efc9

Please sign in to comment.