-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle metadata #2
Comments
totally agree. I haven't had time to respond more fully yet (but generally also feel positively about this initiative!) ... but metadata handling is also my first question. After taking various stabs at metadata, I haven't hit on anything I particularly like personally, nor do I think the current MM approach(es) are necessarily worth emulating.
This indeed becomes a big issue. One thing I have begun to think is that there will be almost no "universally" good solution. Some will want comprehensive metadata, possibly at the expense of performance, others may want bare minimal metadata that takes no time to fetch. I have played a bit with a graphql-like pattern and will "think about it aloud" here, not particular advocating for or against it. {
thing{
name
id
}
} I'm not suggesting we use actually use graphql for anything other than inspiration in the concept that "we (MM) have a schema/API of all the state that we could retrieve" and "they (the storage implementation) declare what things they need to populate whatever metadata scheme they intend to write". For example, see these class StateDict(TypedDict, total=False):
Devices: dict[str, dict[str, str]]
SystemInfo: SystemInfoDict
SystemStatus: SystemStatusDict
ConfigGroups: dict[str, dict[str, Any]]
Image: ImageDict
Position: PositionDict
AutoFocus: AutoFocusDict
PixelSizeConfig: dict[str, str | PixelSizeConfigDict]
DeviceTypes: dict[str, DeviceTypeDict] a storage backend could, for example, give us this string (here as graphql, but could be anything): {
Devices {
Camera {
Binning
Offset
Exposure
}
Dichroic {
Label
}
}
ConfigGroups {
Channel {
current
}
}
SystemInfo {
VersionInfo
}
} and then a fast function could be prepared that would retrieve and return only what is needed: {
"Devices": {
"Camera": {
"Binning": "1",
"Offset": "0",
"Exposure": "100"
}
},
"Dichroic": {
"Label": "400DCLP"
},
"ConfigGroups": {
"Channel": {
"current": "DAPI"
}
},
"SystemInfo": { "VersionInfo": "MMCore version 11.0.0" }
} ... and there could be both a fast per-frame query and a slower start-finish query (with more info if desired). This leaves the question what data is necessary up to the Storage device: if it wants to write OME XML, fine, if it doesn't need all that, also fine. |
I guess it's also possible that this is way too complicated, and just letting them directly use the core api could be better :) |
I agree with everything above, and here is my comment. The metadata strings are supposed to be JSON-encoded data structures. Metadata handling is indeed a hard problem. I do not believe we can develop a universal schema for metadata, and I welcome any ideas in this direction. We can postulate that StorageDevice and MMCore must be able to automatically generate a minimal set of metadata to make a dataset readable. Metadata strings in the API are supposed to be optional. The dataset must be readable even if metadata passed through the API is incomprehensible. In short:
|
If the general idea of having the storage implemented in MMCore is worth developing further, we can pick a couple of widely used formats today and imagine how the client code would look for each. For example, the simple MMCore API would work for writing generic Zarr datasets, but if we say it must be an OME Zarr dataset, it becomes more interesting. If we don't pass any metadata or if the passed metadata does not contain all the required information (or cannot be interpreted), the API would have to auto-generate all the necessary fields. Therefore, to write a perfect OME Zarr dataset, significant cooperation is required between the StorageDevice/MMCore and the calling application. This cooperation can be achieved only through metadata strings. Of course, that is not great. I don't have any good suggestions. Something like what Tally mentioned might be a way to go. |
Interesting ideas! I've only started looking at this so I hope to create more issues as I think about it in more detail, but the first question I have is what is going to be the strategy for handling metadata -- including application-generated metadata.
It looks like you have a
std::string
, which is certainly quite generic, but it is not clear to me how this is intended to work with different file formats. At least on the surface, it would seem that (1) if each file format device interprets the string differently, it will be rather unusable by an application programmer whereas (2) if every file format device uses a common data format for the string, then (aside from the need to propose such a format) serializing it to a string would incur unnecessary overhead (and, depending on the chosen format, could be error-prone).The text was updated successfully, but these errors were encountered: