Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toward storage of arbitrary data types #93

Open
bogovicj opened this issue Feb 24, 2023 · 2 comments
Open

toward storage of arbitrary data types #93

bogovicj opened this issue Feb 24, 2023 · 2 comments

Comments

@bogovicj
Copy link
Contributor

@minnerbe helped with this issue.

N5 core does not support non-native and non-numeric types well (i.e. all DataBlock implementations are native + numeric). E.g. the API can not currently implement / wrap HDF5's string io (see saalfeldlab/n5-hdf5#22)

related

For string support @minnerbe implemented https://github.com/minnerbe/n5/tree/feature/vlstring-io

zarr

Zarr has support for writing object arrays, and does so via codecs in numcodecs. Existing options are JSON, MsgPack, Pickle

proposal

  1. Include a generic Object DataBlock in n5
    • consider a special case String DataBlock?
  2. Add a codec interface
    • could be similar to the existing Compression n5 interface
    • idea similar to numcodecs
    • For Zarr interop, we should add JSON and MsgPack encoders
      • MsgPack may a good default for good zarr interop (see msgpack-java)
      • JSON is easy to implement
    • protobuf is another option to consider
    • Java's object serialization would be easy to add
      • but we probably shouldn't use it
    • Pickle - let's ignore
@minnerbe
Copy link
Contributor

minnerbe commented Feb 24, 2023

Thanks for bringing this up! I would add to point 1 of the proposal that this seems quite similar to the Object data type that is already implemented. However, Objects

  • rely on the native serialization of Java and thus cannot use a custom serialization format;
  • have a fixed data-type meta-data tag ("object"), which may not comply with external standards (e.g., "String(-1)" for variable length string arrays in HDF5).

It's probably worth considering to merge these two concepts and provide a codec that just uses native Java serialization.

@bogovicj
Copy link
Contributor Author

see also #87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants