Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add to_flat, from_flat, like, and better handling for existing arrays / groups #25

Merged
merged 6 commits into from
Feb 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Linux Testing

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.10', '3.11', '3.12']
steps:
- uses: actions/checkout@v4
- name: Install dependencies
shell: "bash -l {0}"
run: |
pip install poetry
poetry install
- name: Test
run: |
poetry run pytest
1 change: 1 addition & 0 deletions docs/api/v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: pydantic_zarr.v2
1 change: 1 addition & 0 deletions docs/api/v3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: pydantic_zarr.v3
188 changes: 186 additions & 2 deletions docs/usage_zarr_v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,190 @@ print(ArraySpec.from_array(np.arange(10)).model_dump())
}
"""
```
### Flattening and unflattening Zarr hierarchies

In the previous section we built a model of a Zarr hierarchy by defining `GroupSpec` and `ArraySpec`
instances, then providing those objects as `members` to the constructor of another `GroupSpec`. In
other words, with this approach we create "child nodes" and give those nodes to the "parent node",
recursively.

Constructing deeply nested hierarchies this way can be tedious.
For this reason, `pydantic-zarr` supports an alternative representation of the Zarr
hierarchy in the form of a dictionary with `str` keys and `ArraySpec` / `GroupSpec` values, and
methods to convert to / from these dictionaries.

#### Making a `GroupSpec` object from a flat hierarchy

This example demonstrates how to create a `GroupSpec` from a `dict` representation of a Zarr hierarchy.

```python
from pydantic_zarr.v2 import GroupSpec, ArraySpec
# other than the key representing the root path "",
# the keys must be valid paths in the Zarr storage hierarchy
# note that the `members` attribute is `None` for the `GroupSpec` instances in this `dict`.
tree = {
"": GroupSpec(members=None, attributes={"root": True}),
"/a": GroupSpec(members=None, attributes={"root": False}),
"/a/b": ArraySpec(shape=(10,10), dtype="uint8", chunks=(1,1))
}

print(GroupSpec.from_flat(tree).model_dump())
"""
{
'zarr_version': 2,
'attributes': {'root': True},
'members': {
'a': {
'zarr_version': 2,
'attributes': {'root': False},
'members': {
'b': {
'zarr_version': 2,
'attributes': {},
'shape': (10, 10),
'chunks': (1, 1),
'dtype': '|u1',
'fill_value': 0,
'order': 'C',
'filters': None,
'dimension_separator': '/',
'compressor': None,
}
},
}
},
}
"""
```

#### flattening `GroupSpec` objects

This is similar to the example above, except that we are working in reverse -- we are making the
flat `dict` from the `GroupSpec` object.

```python
from pydantic_zarr.v2 import GroupSpec, ArraySpec
# other than the key representing the root path "",
# the keys must be valid paths in the Zarr storage hierarchy
# note that the `members` attribute is `None` for the `GroupSpec` instances in this `dict`.

a_b = ArraySpec(shape=(10,10), dtype="uint8", chunks=(1,1))
a = GroupSpec(members={'b': a_b}, attributes={"root": False})
root = GroupSpec(members={'a': a}, attributes={"root": True})

print(root.to_flat())
"""
{
'': GroupSpec(zarr_version=2, attributes={'root': True}, members=None),
'/a': GroupSpec(zarr_version=2, attributes={'root': False}, members=None),
'/a/b': ArraySpec(
zarr_version=2,
attributes={},
shape=(10, 10),
chunks=(1, 1),
dtype='|u1',
fill_value=0,
order='C',
filters=None,
dimension_separator='/',
compressor=None,
),
}
"""
```

#### Implicit groups
`zarr-python` supports creating Zarr arrays or groups deep in the
hierarchy without explicitly creating the intermediate groups first.
`from_flat` models this behavior. For example, `{'/a/b/c': ArraySpec(...)}` implicitly defines the existence of a groups named `a` and `b` (which is contained in `a`). `from_flat` will create the expected `GroupSpec` object from such `dict` instances.

```python
from pydantic_zarr.v2 import GroupSpec, ArraySpec
tree = {'/a/b/c': ArraySpec(shape=(1,), dtype='uint8', chunks=(1,))}
print(GroupSpec.from_flat(tree).model_dump())
"""
{
'zarr_version': 2,
'attributes': {},
'members': {
'a': {
'zarr_version': 2,
'attributes': {},
'members': {
'b': {
'zarr_version': 2,
'attributes': {},
'members': {
'c': {
'zarr_version': 2,
'attributes': {},
'shape': (1,),
'chunks': (1,),
'dtype': '|u1',
'fill_value': 0,
'order': 'C',
'filters': None,
'dimension_separator': '/',
'compressor': None,
}
},
}
},
}
},
}
"""
```

## Comparing `GroupSpec` and `ArraySpec` models

`GroupSpec` and `ArraySpec` both have `like` methods that take another `GroupSpec` or `ArraySpec` as an argument and return `True` (the models are like each other) or `False` (the models are not like each other).

The `like` method works by converting both input models to `dict` via `pydantic.BaseModel.model_dump`, and comparing the `dict` representation of the models. This means that instances of two different subclasses of `GroupSpec`, which would not be considered equal according to the `==` operator, will be considered `like` if and only if they serialize to identical `dict` instances.

The `like` method also takes keyword arguments `include` and `exclude`, which results in attributes being explicitly included or excluded from the model comparison. So it's possible to use `like` to check if two `ArraySpec` instances have the same `shape` and `dtype` by calling `array_a.like(array_b, include={'shape', 'dtype'})`. This is useful if you don't care about the compressor or filters and just want to ensure that you can safely write an in-memory array to a Zarr array.

```python
from pydantic_zarr.v2 import ArraySpec, GroupSpec
import zarr
arr_a = ArraySpec(shape=(1,), dtype='uint8', chunks=(1,))
arr_b = ArraySpec(shape=(2,), dtype='uint8', chunks=(1,)) # array with different shape

print(arr_a.like(arr_b)) # False, because of mismatched shape
#> False

print(arr_a.like(arr_b, exclude={'shape'})) # True, because we exclude shape.
#> True

# `ArraySpec.like` will convert a zarr.Array to ArraySpec
store = zarr.MemoryStore()
arr_a_stored = arr_a.to_zarr(store, path='arr_a') # this is a zarr.Array

print(arr_a.like(arr_a_stored)) # arr_a is like the zarr.Array version of itself
#> True

print(arr_b.like(arr_a_stored)) # False, because of mismatched shape
#> False

print(arr_b.like(arr_a_stored, exclude={'shape'})) # True, because we exclude shape.
#> True

# the same thing thing for groups
g_a = GroupSpec(attributes={'foo': 10}, members={'a': arr_a, 'b': arr_b})
g_b = GroupSpec(attributes={'foo': 11}, members={'a': arr_a, 'b': arr_b})

print(g_a.like(g_a)) # g_a is like itself
#> True

print(g_a.like(g_b)) # False, because of mismatched attributes
#> False

print(g_a.like(g_b, exclude={'attributes'})) # True, because we ignore attributes
#> True

print(g_a.like(g_a.to_zarr(store, path='g_a'))) # g_a is like its zarr.Group counterpart
#> True
```

## Using generic types

Expand Down Expand Up @@ -132,7 +316,7 @@ except ValidationError as exc:
1 validation error for GroupSpec[GroupAttrs, ~TItem]
attributes.b
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='foo', input_type=str]
For further information visit https://errors.pydantic.dev/2.4/v/int_parsing
For further information visit https://errors.pydantic.dev/2.6/v/int_parsing
"""

# this passes validation
Expand All @@ -151,7 +335,7 @@ except ValidationError as exc:
1 validation error for GroupSpec[~TAttr, ArraySpec]
members.foo
Input should be a valid dictionary or instance of ArraySpec [type=model_type, input_value=GroupSpec(zarr_version=2,...tributes={}, members={}), input_type=GroupSpec]
For further information visit https://errors.pydantic.dev/2.4/v/model_type
For further information visit https://errors.pydantic.dev/2.6/v/model_type
"""

# this passes validation
Expand Down
13 changes: 11 additions & 2 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,23 @@ nav:
- About: index.md
- Usage (Zarr V3): usage_zarr_v3.md
- Usage (Zarr V2): usage_zarr_v2.md
- API: api/core.md
- API:
- core: api/core.md
- v2: api/v2.md
- v3: api/v3.md

plugins:
- mkdocstrings:
handlers:
python:
options:
show_signature_annotations: true
docstring_style: numpy
members_order: source
separate_signature: true
filters: ["!^_"]
docstring_options:
ignore_init_summary: true
merge_init_into_class: true

markdown_extensions:
- pymdownx.highlight:
Expand Down
Loading