Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating array metadata from attribute metadata #24

Open
jakirkham opened this issue Jan 26, 2018 · 8 comments
Open

Separating array metadata from attribute metadata #24

jakirkham opened this issue Jan 26, 2018 · 8 comments

Comments

@jakirkham
Copy link
Contributor

As a suggestion, it might be nice to separate array metadata (e.g. dimensions, blockSize, etc.) from attribute metadata. IOW having a separate JSON file for each one. Admittedly this would be a breaking change as it is currently proposed. Unsure if there is a smoother way to handle this.

@axtimwalde
Copy link
Collaborator

That would make it more complex and I would like to keep it as simple as possible at this time. It's the extreme on the simple side of many possible more complex solutions (we could imagine structured, typed attributes, compressed, text, binary, ...). Are you worried about not being able to use these keys for atributes?

@hanslovsky
Copy link
Contributor

I am in favor of refactoring the attributes structure (on the long run). As a minimal solution, the footprint of the n5-specific meta data within the attributes.json file should be as small as possible, e.g. storing all n5-specific attributes under a separate key ("n5").

@axtimwalde
Copy link
Collaborator

I am not convinced. Meta-formats on top of N5 would then have the n5-internal meta-data separated from their custom meta-data which is weird. I can imagine to separate things into an open number of 'name-spaces', each with their own json attributes, but I can also do the same hting in a single json block, so not convinced either. It would be a one level structure on top of structured elements, cannot see what this would be good for. Example:

{
  n5 : {
    dimensions : [...],
    blockSize : [...],
   ... },
  custom : {
    name : "...",
    ... }
}

vs.

n5.json:

{
  dimensions : [...],
  blockSize : [...],
  ... }

custom.json:

{
    name : "...",
    ... }

Where is the advantage other than that we need another discovery mechanism for how many 'name-spaces' there are?

@jakirkham
Copy link
Contributor Author

When I raised this issue (and a couple others), there were two things I had in mind. These are basically the same things that I'm interested in today. They are as follows:

  1. Avoiding collisions between user-defined keys and spec keys
  2. Closing the gap between the N5 and Zarr specs

As to item 1, there has been a fair bit of discussion in Zarr about the ways we might expand the spec to address needs of users in our community and where those extensions sit in the stack. ( zarr-developers/zarr-python#276 ) ( zarr-developers/zarr-python#280 ) So this has practical applications and is not a purely theoretical discussion. This may indirectly affect the viability of item 2.

Item 2 has already been discussed in other issues. Linking them here though for context. ( https://github.com/zarr-developers/zarr/issues/231 ) ( https://github.com/zarr-developers/zarr/issues/291 ) This has practical value as well; particularly as new language implementations emerge. ( https://github.com/zarr-developers/zarr/issues/286 ) Am eager to see consolidation between the two communities to better leverage the work already done. Given the close similarity between N5 and Zarr, the work already done to better integrate the two, and the good communication between the two communities, remain hopeful that item 2 is achievable.

Side note: While these seem to be two distinct concerns, it is worth noting there is some crossover between them.

cc @alimanfoo

@axtimwalde
Copy link
Collaborator

Waiting for a convincing solution. After implementing the zarr approach in n5-zarr, I am not a big fan of the solution there which is doing the same thing in three places instead of one. I also like that the core attributes are just attributes which is similar to how I would access them in HDF5.

@jakirkham
Copy link
Contributor Author

Do we have a spec issue for this? 🙂

@axtimwalde
Copy link
Collaborator

No, because I do not have a better idea, so rather stay silent :).

@jakirkham
Copy link
Contributor Author

Better to start the conversation I think. Even if we don’t know the answer 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants