Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about per-frame metadata #30

Open
tlambert03 opened this issue Dec 14, 2024 · 15 comments
Open

Question about per-frame metadata #30

tlambert03 opened this issue Dec 14, 2024 · 15 comments

Comments

@tlambert03
Copy link
Collaborator

I've been digging a bit deeper into the codebase, as I work through @go2scope's storage-device proposal for micro-manager. Very excited about the potential.

I have a question about how we should be handling "per frame" metadata. I see ZarrStreamSettings_s.custom_metadata, and it works fine to add one-time additional metdata at the creation of the StreamSettings (which I presume is almost always going to be at the beginning of an acquisition). But I'm curious where/how any additional metadata accumulated during the sequence can be added (this will inevitably be needed to data that can't be known a-priori, like time-stamps, etc...)

Looks like the primary mechanism for writing additional data is ZarrStream_append, which doesn't take external metadata, and I also don't see a mechanism for "rewriting" external metadata after stream creation.

Any thoughts on how that might look?

@go2scope
Copy link

go2scope commented Dec 14, 2024 via email

@tlambert03
Copy link
Collaborator Author

tlambert03 commented Dec 14, 2024

The metadata in addImage() and appendImage() is optional.

yeah i understand that it's currently optional as far as the MMCore spec is concerned ... but I need it 😂 and when I went to look inside the acquire-zarr source to see how it might be implemented, i didn't find anything. (i saw that big tiff works, that's fine, but it's acquire-zarr that I'm excited about).

In any case, this is less of a question about the MMCore implementation, and more a question of how acquire-zarr itself could support adding anything beyond deterministic pre-acquisition metadata. If the general answer is "we don't support that" then sure, an MMCore storage device would need to devise its own workaround like storing metadata and then overwriting what ZarrStream.custom_meta originally wrote

@go2scope
Copy link

go2scope commented Dec 14, 2024 via email

@tlambert03
Copy link
Collaborator Author

I am sure Nathan and Alan will be able to provide a solution

yep, that's why I opened this to hear how they are thinking about it :) to see which solutions they expect they will prefer, and which they see as out of scope, etc...

A more difficult problem is the max-efficiency setup streaming setup, where
we attach storage directly to the circular buffer, and the camera controls
the acquisition.

that's MMCore specific considerations, right? again, I really wasn't trying to get into mmcore stuff here. I mentioned it in my post to provide context, but it's really a more abstract question about how acquire-zarr intends (or doesn't intend) to handle metadata that isn't defined at stream-definition time. more generically, outside of MMCore, etc...

@aliddell
Copy link
Member

Nenad @go2scope and I spoke about this yesterday and a first step could be allowing the user to read in or overwrite the custom metadata in full at any point during streaming.

Depending on what you're saving, per-frame metadata can get quite large for a long acquisition, so a JSON encoding may not be the best way to go. Another approach might be to modify the append function to take one or more of a struct like this:

struct FrameMetadata {
    const char* array_name;
    ZarrDataType type;
    void* data;
    size_t nbytes_data;
};

and save per-frame metadata to 1D arrays within the group. So your Zarr group structure might look like

benchmark.zarr/
├── data
│   └── root # group name
│       ├── 0 # full-resolution LOD array
│       │  └── ...
│       ├── 1 # Downsampled LOD array
│       │  └── ...
│       └── timestamps # 1D timestamp array
└── ...

with metadata to match. Both of these solutions together would work pretty well, I think. @tlambert03

@nclack
Copy link
Member

nclack commented Dec 17, 2024

I'm not aware of a standard built on top of zarr that provides for per-frame metadata. It's a gap in the ome-zarr standard.

We could create a de facto one, and we can certainly think about api design. But since it's not part of zarr we never added any support.

To me, the most important thing is timestamps and other telemetry (scalars). Next might be keeping track of state changes on the instrument (event-driven structured data).

If I needed to solve this with zarr today, I'd save those files on the side (possibly in the zarr root directory); I'd use different formats for the "telemetry" and "event-driven" data, but both would have to have some way of correlating measurements with indices in the n-dimensional array (that seems straightforward). So it's possible for people to start playing with solving this problem without doing anything with acquire-zarr.

@tlambert03
Copy link
Collaborator Author

can certainly do it outside of acquire-zarr, was mostly curious if you see this as in scope or not 👍. I can definitely understand not wanting to get into it if the ome-zarr itself doesn't comment on it

@nclack
Copy link
Member

nclack commented Dec 17, 2024

It's important. I wouldn't say it's out of scope, but without more information it looks like an open-ended problem. I'm looking for examples of use cases that might inform how we can bound the solution. Micromanager has a more defined use case around logging its metadata, for example. That could be solved at the level of the zarr storage device there (using Nenad's pr).

@tlambert03
Copy link
Collaborator Author

yep, definitely recognize that this can be solved over there. Part of me imagines that nenad's storage device PR won't be the only way that I will ever want to interact with acquire-zarr, which is why I keep bringing this back to a slightly higher level discussion. I very much recognize that we can do whatever we want over there to solve this :)

I'm looking for examples of use cases that might inform how we can bound the solution.

for me, I think the most natural thing that acquire-zarr could do (without making too many assumptions) would probably be to follow the general stream.append format, sticking with a flat list of custom metadata somewhere. essentially, it would be the per-frame equivalent to custom_metadata. it would not need to do any terribly complex association with higher dimensional stuff. It would just acknowledge that the primary API of acquire-zarr (if I understand it correctly at this point) is to acknowledge that the data will come in increments, and each of those increments may have associated custom_metadata. So right next to custom_metadata could be a frame_metadata: [] list of json objects.

I also agree with @aliddell that json is probably not the best format for performance (i have currently been using msgpack instead) and to the extent that that is incompatible with zarr (is it?), then I think it would be reasonable for you to punt on that and instruct users to roll their own metadata on the side

@go2scope
Copy link

go2scope commented Dec 17, 2024 via email

@tlambert03
Copy link
Collaborator Author

Yeah making that mutable definitely opens up a lot of flexibility, without adding a new spec

@nclack
Copy link
Member

nclack commented Dec 18, 2024

If you just added an API call to write (or overwrite) custom metadata at any time, the problem is solved

ooh that's a good idea. What's the proposal here? Something like update_external_metadata?

@go2scope
Copy link

go2scope commented Dec 18, 2024 via email

@nclack
Copy link
Member

nclack commented Dec 19, 2024

@aliddell what do you think?

@aliddell
Copy link
Member

@nclack That'll work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants