Question about per-frame metadata #30

tlambert03 · 2024-12-14T19:40:14Z

I've been digging a bit deeper into the codebase, as I work through @go2scope's storage-device proposal for micro-manager. Very excited about the potential.

I have a question about how we should be handling "per frame" metadata. I see ZarrStreamSettings_s.custom_metadata, and it works fine to add one-time additional metdata at the creation of the StreamSettings (which I presume is almost always going to be at the beginning of an acquisition). But I'm curious where/how any additional metadata accumulated during the sequence can be added (this will inevitably be needed to data that can't be known a-priori, like time-stamps, etc...)

Looks like the primary mechanism for writing additional data is ZarrStream_append, which doesn't take external metadata, and I also don't see a mechanism for "rewriting" external metadata after stream creation.

Any thoughts on how that might look?

The text was updated successfully, but these errors were encountered:

go2scope · 2024-12-14T19:56:36Z

Hi Talley The metadata in addImage() and appendImage() is optional. It might be nice to have an option to add tags that contain real-time, unpredictable data. Where and how this data might be stored is implementation-dependent. Acquire-zarr can't overwrite or append custom metadata, so, at least for now, adding metadata would generate a run-time error. BigTiff writer, on the other hand, is fine with inserting arbitrary metadata into multi-image TIF blocks. There is a processing penalty for accessing this metadata without getting image pixels, but that's the tradeoff for having a simple implementation. One could imagine that some future versions of zarr writer might be able to add image-based metadata to the summary one specified at the start. For example, it might cache the image meta in a temporary file and append it to the custom/summary meta at the end of the acquisition. Or something like that. That was the general idea. Nenad

…

On Sat, Dec 14, 2024 at 11:40 AM Talley Lambert ***@***.***> wrote: I've been digging a bit deeper into the codebase, as I work through @go2scope <https://github.com/go2scope>'s storage-device proposal for micro-manager. Very excited about the potential. I have a question about how we should be handling "per frame" metadata. I see ZarrStreamSettings_s.custom_metadata, and it works fine to add one-time additional metdata at the creation of the StreamSettings (which I presume is almost always going to be at the beginning of an acquisition). But I'm curious where/how any additional metadata accumulated during the sequence can be added (this will inevitably be needed to data that can't be known a-priori, like time-stamps, etc...) Looks like the primary mechanism for writing additional data is ZarrStream_append, which doesn't take external metadata, and I also don't see a mechanism for "rewriting" external metadata after stream creation. Any thoughts on how that might look? — Reply to this email directly, view it on GitHub <#30>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMVLVO2B7FAZBDPTE5L7QC32FSCTHAVCNFSM6AAAAABTTZX4IWVHI2DSMVQWIX3LMV43ASLTON2WKOZSG42DAMJRGUYDMOI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tlambert03 · 2024-12-14T20:00:25Z

The metadata in addImage() and appendImage() is optional.

yeah i understand that it's currently optional as far as the MMCore spec is concerned ... but I need it 😂 and when I went to look inside the acquire-zarr source to see how it might be implemented, i didn't find anything. (i saw that big tiff works, that's fine, but it's acquire-zarr that I'm excited about).

In any case, this is less of a question about the MMCore implementation, and more a question of how acquire-zarr itself could support adding anything beyond deterministic pre-acquisition metadata. If the general answer is "we don't support that" then sure, an MMCore storage device would need to devise its own workaround like storing metadata and then overwriting what ZarrStream.custom_meta originally wrote

go2scope · 2024-12-14T21:38:47Z

I am sure Nathan and Alan will be able to provide a solution. For example, at the very minimum, they should make it possible to re-write or append to custom metadata created at the beginning. Since this metadata is optional and its name is "custom," I don't see why it must be determined initially and stay immutable afterward. If they allow modifications, the mm adapter can cache image metadata and append it to the custom metadata at the end, or at any appropriate time. Since acquire-zarr requires JSON encoding, adding more fields would not be a problem unless the Zarr standard explicitly prescribes that we should not do that. A more difficult problem is the max-efficiency setup streaming setup, where we attach storage directly to the circular buffer, and the camera controls the acquisition. In that case, the application doesn't know when a particular image is inserted, and metadata can't be added. This might be a non-issue because, in the asynchronous scenario, you need to know the exact conditions when a particular image is added. The current API allows for a compromise solution. The camera controls when images go into the buffer, but the application controls when they are written to disk (saveNextImage()) and has the opportunity to insert the metadata. Nenad

…

On Sat, Dec 14, 2024 at 12:00 PM Talley Lambert ***@***.***> wrote: The metadata in addImage() and appendImage() is optional. yeah i understand that it's currently optional as far as the MMCore spec is concerned ... but I need it 😂 and when I went to look inside the acquire-zarr source to see how it might be implemented, i didn't find anything. (i saw that big tiff works, that's fine, but it's acquire-zarr that I'm excited about). In any case, this is less of a question about the MMCore implementation, and more a question of how acquire-zarr itself could support adding anything beyond deterministic pre-acquisition metadata. — Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMVLVO5KCCUDNPZKCEG6ULT2FSE67AVCNFSM6AAAAABTTZX4IWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBTGMZDQMZRGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tlambert03 · 2024-12-14T21:43:01Z

I am sure Nathan and Alan will be able to provide a solution

yep, that's why I opened this to hear how they are thinking about it :) to see which solutions they expect they will prefer, and which they see as out of scope, etc...

A more difficult problem is the max-efficiency setup streaming setup, where
we attach storage directly to the circular buffer, and the camera controls
the acquisition.

that's MMCore specific considerations, right? again, I really wasn't trying to get into mmcore stuff here. I mentioned it in my post to provide context, but it's really a more abstract question about how acquire-zarr intends (or doesn't intend) to handle metadata that isn't defined at stream-definition time. more generically, outside of MMCore, etc...

aliddell · 2024-12-17T15:06:33Z

Nenad @go2scope and I spoke about this yesterday and a first step could be allowing the user to read in or overwrite the custom metadata in full at any point during streaming.

Depending on what you're saving, per-frame metadata can get quite large for a long acquisition, so a JSON encoding may not be the best way to go. Another approach might be to modify the append function to take one or more of a struct like this:

struct FrameMetadata {
    const char* array_name;
    ZarrDataType type;
    void* data;
    size_t nbytes_data;
};

and save per-frame metadata to 1D arrays within the group. So your Zarr group structure might look like

benchmark.zarr/
├── data
│   └── root # group name
│       ├── 0 # full-resolution LOD array
│       │  └── ...
│       ├── 1 # Downsampled LOD array
│       │  └── ...
│       └── timestamps # 1D timestamp array
└── ...

with metadata to match. Both of these solutions together would work pretty well, I think. @tlambert03

nclack · 2024-12-17T16:28:04Z

I'm not aware of a standard built on top of zarr that provides for per-frame metadata. It's a gap in the ome-zarr standard.

We could create a de facto one, and we can certainly think about api design. But since it's not part of zarr we never added any support.

To me, the most important thing is timestamps and other telemetry (scalars). Next might be keeping track of state changes on the instrument (event-driven structured data).

If I needed to solve this with zarr today, I'd save those files on the side (possibly in the zarr root directory); I'd use different formats for the "telemetry" and "event-driven" data, but both would have to have some way of correlating measurements with indices in the n-dimensional array (that seems straightforward). So it's possible for people to start playing with solving this problem without doing anything with acquire-zarr.

tlambert03 · 2024-12-17T16:30:48Z

can certainly do it outside of acquire-zarr, was mostly curious if you see this as in scope or not 👍. I can definitely understand not wanting to get into it if the ome-zarr itself doesn't comment on it

nclack · 2024-12-17T16:39:16Z

It's important. I wouldn't say it's out of scope, but without more information it looks like an open-ended problem. I'm looking for examples of use cases that might inform how we can bound the solution. Micromanager has a more defined use case around logging its metadata, for example. That could be solved at the level of the zarr storage device there (using Nenad's pr).

tlambert03 · 2024-12-17T16:48:16Z

yep, definitely recognize that this can be solved over there. Part of me imagines that nenad's storage device PR won't be the only way that I will ever want to interact with acquire-zarr, which is why I keep bringing this back to a slightly higher level discussion. I very much recognize that we can do whatever we want over there to solve this :)

I'm looking for examples of use cases that might inform how we can bound the solution.

for me, I think the most natural thing that acquire-zarr could do (without making too many assumptions) would probably be to follow the general stream.append format, sticking with a flat list of custom metadata somewhere. essentially, it would be the per-frame equivalent to custom_metadata. it would not need to do any terribly complex association with higher dimensional stuff. It would just acknowledge that the primary API of acquire-zarr (if I understand it correctly at this point) is to acknowledge that the data will come in increments, and each of those increments may have associated custom_metadata. So right next to custom_metadata could be a frame_metadata: [] list of json objects.

I also agree with @aliddell that json is probably not the best format for performance (i have currently been using msgpack instead) and to the extent that that is incompatible with zarr (is it?), then I think it would be reasonable for you to punt on that and instruct users to roll their own metadata on the side

go2scope · 2024-12-17T16:58:30Z

Nathan If you just added an API call to write (or overwrite) custom metadata at any time, the problem is solved - at least for the micro-manager adapter. We can use custom metadata to store whatever additional information we want. I don't see why it must be immutable during acquisition and must be created *before* acquisition starts. These constraints seem arbitrary to me. It is "custom". No code inside the Zarr library depends on its contents. Also, I think it is fine if it grows large. We must add the per-frame metadata to make the Zarr writer fully compatible with the micro-manager. I was thinking of creating an additional file that the MM driver wrapping the zarr library would have to insert somewhere. That's much less desirable than using the facility that you would provide at the library level. Nenad

…

On Tue, Dec 17, 2024 at 8:28 AM Nathan Clack ***@***.***> wrote: I'm not aware of a standard built on top of zarr that provides for per-frame metadata. It's a gap in the ome-zarr standard. We could create a de facto one, and we can certainly think about api design. But since it's not part of zarr we never added any support. To me, the most important thing is timestamps and other telemetry (scalars). Next might be keeping track of state changes on the instrument (event-driven structured data). If I needed to solve this with zarr today, I'd save those files on the side (possibly in the zarr root directory). I'd use different formats for the "telemetry" and "event-driven" data, but both would have to have some way of correlating measurements with indices in the n-dimensional array (that seems straightforward). — Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMVLVOZAZDJAMY5G2M6HRBD2GBGKVAVCNFSM6AAAAABTTZX4IWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBYHE2TCMBTGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tlambert03 · 2024-12-17T17:26:42Z

Yeah making that mutable definitely opens up a lot of flexibility, without adding a new spec

nclack · 2024-12-18T01:25:57Z

If you just added an API call to write (or overwrite) custom metadata at any time, the problem is solved

ooh that's a good idea. What's the proposal here? Something like update_external_metadata?

go2scope · 2024-12-18T01:34:46Z

I suggest deleting the call ZarrStreamSettings_set_custom_metadata and adding a new one: ZarrStream_set_custom_metadata(bool overwrite) that one can call at any time after the stream is open and before it is closed. This call can overwrite whatever was there before if overwrite is true. If overwrite is false, overwriting will generate a run-time error. Or something in that spirit. In principle, we don't even have to be able to overwrite, if that sounds uncomfortable. Nenad

…

On Tue, Dec 17, 2024 at 5:26 PM Nathan Clack ***@***.***> wrote: If you just added an API call to write (or overwrite) custom metadata at any time, the problem is solved ooh that's a good idea. What's the proposal here? Something like update_external_metadata? — Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMVLVO7WOFQ4VF4G6MPXBHL2GDFLVAVCNFSM6AAAAABTTZX4IWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJQGA3TONBVGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

nclack · 2024-12-19T17:51:08Z

@aliddell what do you think?

aliddell · 2024-12-19T17:55:30Z

@nclack That'll work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about per-frame metadata #30

Question about per-frame metadata #30

tlambert03 commented Dec 14, 2024

go2scope commented Dec 14, 2024 via email

tlambert03 commented Dec 14, 2024 •

edited

Loading

go2scope commented Dec 14, 2024 via email

tlambert03 commented Dec 14, 2024

aliddell commented Dec 17, 2024

nclack commented Dec 17, 2024 •

edited

Loading

tlambert03 commented Dec 17, 2024

nclack commented Dec 17, 2024

tlambert03 commented Dec 17, 2024

go2scope commented Dec 17, 2024 via email

tlambert03 commented Dec 17, 2024

nclack commented Dec 18, 2024

go2scope commented Dec 18, 2024 via email

nclack commented Dec 19, 2024

aliddell commented Dec 19, 2024

Question about per-frame metadata #30

Question about per-frame metadata #30

Comments

tlambert03 commented Dec 14, 2024

go2scope commented Dec 14, 2024 via email

tlambert03 commented Dec 14, 2024 • edited Loading

go2scope commented Dec 14, 2024 via email

tlambert03 commented Dec 14, 2024

aliddell commented Dec 17, 2024

nclack commented Dec 17, 2024 • edited Loading

tlambert03 commented Dec 17, 2024

nclack commented Dec 17, 2024

tlambert03 commented Dec 17, 2024

go2scope commented Dec 17, 2024 via email

tlambert03 commented Dec 17, 2024

nclack commented Dec 18, 2024

go2scope commented Dec 18, 2024 via email

nclack commented Dec 19, 2024

aliddell commented Dec 19, 2024

tlambert03 commented Dec 14, 2024 •

edited

Loading

nclack commented Dec 17, 2024 •

edited

Loading