Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-3: more dimensions for thee #239

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

RFC-3: more dimensions for thee #239

wants to merge 23 commits into from

Conversation

jni
Copy link

@jni jni commented May 21, 2024

To make sure we have consensus, I'm opening this RFC in the style of RFC-2. (I'm aware that RFC-1 has some pending issues to be resolved, but when consensus is possible 🤞 this is a useful way to document the history of past decisions.) Please add a thumbs up if you want to be listed as an endorser. Please reply if you have concerns.

@d-v-b @joshmoore @normanrz @bogovicj @will-moore @ziw-liu @tlambert03

Please add pings for authors of libraries implementing ome-ngff readers and writers, as the main effect here is not on existing files but on implementations that may implement too-restrictive a spec.

My goal is to get this and #235 merged before the upcoming 0.5 release. 🤞 (I think that is being targeted for late June/early July? @joshmoore?)

Review URL: https://ngff--239.org.readthedocs.build/rfc/3/

Copy link
Contributor

github-actions bot commented May 21, 2024

Automated Review URLs

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-ngff-update-postponing-transforms-previously-v0-5/95617/5

@jni
Copy link
Author

jni commented May 21, 2024

PS: I asked @joshmoore whether whimsy was allowed and he said yes, hence the title. (This comes after I realised I couldn't have "RFC-2: dimensional hullabaloo" because @normanrz had taken that number already. 😂)

@jni jni changed the title RFC-3: More dimensions for thee RFC-3: more dimensions for thee May 21, 2024
@tlambert03
Copy link

full endorsement. While i absolutely recognize the significant challenge that lifting the strict dimensionality model may pose for mapping arbitrary future usage onto legacy code bases that have been built around XYZCT, I fully agree that a true next-generation format is going to have to lift it. I have personally experienced a number of use cases and applications where the current restrictions have led me to delay adopting ngff in my own work, and this RFC would allow me to more enthusiastically consider adoption.

I agree with @jni that concerns around communicating the semantics of specific axes (i.e. formally named "X", "Y" and "Z") are better addressed by additional keys in the axis metadata, such as "type" and "space".

@joshmoore
Copy link
Member

jni commented 4 minutes ago
I asked @joshmoore whether whimsy was allowed and he said yes, hence the title.

For comparison, https://datatracker.ietf.org/doc/html/rfc2549 ("IP over Avian Carriers")

@joshmoore
Copy link
Member

tlambert03 commented 1 minute ago
I have personally experienced a number of use cases and applications where the current restrictions have led me to delay adopting ngff in my own work, and this RFC would allow me to more enthusiastically consider adoption.

Would you be able/willing to contribute those, perhaps even for a section in the RFC?

@tlambert03
Copy link

tlambert03 commented May 21, 2024

Would you be able/willing to contribute those, perhaps even for a section in the RFC?

Sure, the most direct stories I can share are from implementing writers for data coming off microscopes (code in pymmcore-plus/mda/handlers). There I essentially have a class OMEZarrWriter(_5DWriterBase) that is able to accept acquisition definitions that explicitly adhere to the 5D model, but which will simply fail otherwise. As a result, I find myself using a more general zarr format more often, and have kind of punted for now on outputting data to something that I know users will be able to open in a variety of downstream applications (not that I love that either)

@tlambert03
Copy link

it's possible that @nclack and/or @aliddell would have opinions here as well, as I know they've spent a fair amount of time thinking about how to map a variety of custom experiment types to the ngff format in the acquire-python schema

@jni
Copy link
Author

jni commented May 22, 2024

@tlambert03 thanks for the links! I'll add these to the background section, but could you point me to where in the code

is able to accept acquisition definitions that explicitly adhere to the 5D model, but which will simply fail otherwise.

would fail? The smoking gun would be:

  • an example acquisition definition that cannot be handled by ome-ngff v0.4
  • the line(s) of code that would attempt to match the acquisition to the ome-ngff handler and fail

Maybe it's not as easy as that to define these things compactly, but if it is, I think it would be worthwhile detail for this RFC's motivation.

@joshmoore
Copy link
Member

joshmoore commented May 22, 2024

A few quick clarifications, @jni:

  • can we change @d-v-b's role to "spec updates"? In my mind, the "implementation" work will require (if necessary) changes to the implementations themselves whereas remove axis restrictions #235 takes care of "S1. ... AUTHORS ... update the spec"
  • Are you looking to clarify whether or not to remove the 3 space dimensions restriction in this RFC or is this something for a future RFC?
  • Perhaps let's change "Reviewers" to "proposed reviewers" and drop the "Your name here" before we merge, but 👍 for reaching out to interested parties at this stage.
  • Thinking about a future compatibility matrix (see Add initial compatibility matrix #59) would it make sense to define designations for the support provided by implementations?

@will-moore
Copy link
Member

Re: NGFF readers:

cc @manzt - https://github.com/hms-dbmi/vizarr - Any idea how much work it would be to support n-dimensional NGFF data?

cc @dgault - https://github.com/ome/ZarrReader/ - Since the OME data model is very much 5D, this is going to take a bit of thought on how to handle n-dimensional NGFF data?

@d-v-b
Copy link
Contributor

d-v-b commented May 22, 2024

  • Are you looking to clarify whether or not to remove the 3 space dimensions restriction in this RFC or is this something for a future RFC?

The space restrictions, and all other axis restrictions (other than the requirement that axes have unique names) are removed in #235

@normanrz
Copy link
Contributor

normanrz commented May 22, 2024

Re: NGFF readers:

Webknossos already supports an arbitrary number of dimensions. However, it assumes that there are only 3 space dimensions to map to xyz. I think the spec should provide guidance to visualization tools what to do with >3 space dimensions.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/request-for-info-ngff-use-cases-that-are-currently-blocked/96641/1

rfc/3/index.md Outdated
Comment on lines 141 to 144
As part of the [proposed implementation][implementation], Davis Bennett has
created pydantic models that validate the proposed schema. These are actually
new additions to the NGFF specification, surfaced pre-existing errors in the
schema, and should prevent new errors from appearing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this text. Those pydantic models are merely a convenient way to write JSON schema. They don't express anything that's not already written in the prose of the spec. Also, I am planning on removing those models from the PR, because they add an undocumented build step that I don't have the energy to document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as of aa5c953 those models are gone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-v-b I really loved the models! 😭

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they will live here https://github.com/janeliascicomp/pydantic-ome-ngff if 0.5 comes out

@will-moore
Copy link
Member

The PR at #235 mentioned above seems to go a bit further than this RFC in that it removes restrictions on ordering of dimensions, whereas this proposal only mentions removing the restriction on the number of dimensions.

I imagine that supporting arbitrary dimension order is a fair bit more work for implementers that n-dimensions, so that endorsement of this proposal may not signal endorsement of #235?

@d-v-b
Copy link
Contributor

d-v-b commented May 22, 2024

regarding advice for partial implementations (e.g., implementations that only support a fixed number of dimensions, or a fixed order), I included the following section in the PR: https://github.com/ome/ngff/pull/235/files#diff-ffe6148e5d9f47acc4337bb319ed4503a810214933e51f5f3e46a227b10e3fcdR565-R580, please let me know if this guidance is sufficient or if we should say more (and lets have that conversation over in #235 instead of here, so that we can keep synchronized with the actual changes to the spec)

rfc/3/index.md Outdated Show resolved Hide resolved
rfc/3/index.md Outdated Show resolved Hide resolved
rfc/3/index.md Outdated Show resolved Hide resolved
@jni
Copy link
Author

jni commented May 22, 2024

@will-moore

The PR at #235 mentioned above seems to go a bit further than this RFC in that it removes restrictions on ordering of dimensions, whereas this proposal only mentions removing the restriction on the number of dimensions.

I probably need to update the summary at the top, but under "proposal" I write:

Proposal

This document proposes removing any restrictions on the number of dimensions
stored in NGFF arrays. Additionally, it removes restrictions on the names and
types of included dimensions.

If the names are arbitrary, the ordering must also be arbitrary, surely? But I can make it explicit.

rfc/3/index.md Show resolved Hide resolved
Comment on lines +111 to +117
A draft proposal for [coordinate transformations][trafo spec] already includes
most of the changes proposed here, so we envision that this RFC is compatible
with future plans for the format. The proposal does currently limit the number
of dimensions of type "space" to at most 3, but that limit [could be
removed][space dims comment]. If this RFC is approved, the transformation
specification would need to be updated to reflect this. However, that is an easy
change and there seems to be sufficient support in the community for this idea.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to talk about that (stalled) PR at all? I don't see why it's relevant here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's relevant because it speaks to the forward compatibility of this RFC — ie it is in line with existing proposals for the format. That the PR is stalled is not really relevant — it's stalled because of minor details (e.g. array order) that don't have a bearing on this PR. Based on the discussion, other aspects, and certainly the ones relevant to this RFC, have quite broad consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case isn't it sufficient to just state that there are no known conflicts with other active proposals?

@jni
Copy link
Author

jni commented May 22, 2024

@normanrz

Webknossos already supports an arbitrary number of dimensions. However, it assumes that there are only 3 space dimensions to map to xyz. I think the spec should provide guidance to visualization tools what to do with >3 space dimensions.

in my opinion the spec should leave this question undefined. The mapping can be direct (x=x, y=y, z=z), user defined (give users options of how to map axes), or arbitrary (x=foo, y=bar, z=baz). In practice, I think it is not going to be an issue, I am just wary of restricting the number for the same reason that I was wary of restricting the total number of dimensions, which indeed caused problems.

If it helps this RFC move forward, I can bring back the "maximum three spatial dimensions" limit from #138, and we can have the discussion in a later RFC. The unlike the other changes in this RFC, the removal of the maximum number of "space" dimensions is purely speculative on my end, and not motivated from a concrete use case.

Action requested:

  • if you would like me to limit the number of spatial dimensions to a maximum of 3, please 👍 this post.
  • if you would like the number of spatial dimensions to remain arbitrary (string theory ftw!), 👎 this post, and ideally please provide suggestions below for what guidance you would offer implementers.

@d-v-b
Copy link
Contributor

d-v-b commented May 22, 2024

  • if you would like the number of spatial dimensions to remain arbitrary (string theory ftw!), 👎 this post, and ideally please provide suggestions below for what guidance you would offer implementers.

Implementations that do not support some aspect of user data should clearly communicate that to users. Users can then decide which implementation use, given the data they have stored. We should not try to limit the data that users can store, simply because some implementations cannot represent that data.

This is a broader issue: as of 0.4, there are lots of OME-NGFF tools that don't support big images on cloud storage (of which I have plenty). Should we change the spec to limit the size, or location of images, just because some implementations can't load my big ones? I don't think so. So for the same reason, we should not restrict what axes users have, just because some implementations are opinionated about axes.

@jni
Copy link
Author

jni commented Jun 5, 2024

Thanks for the input @sebi06! Would you be happy to be listed as an endorser of the RFC? (If so, please 👍 the original post at the top.) I use libczi as an example in the RFC, but if it were endorsed by you directly that example would probably hold more weight. 😊

@jni
Copy link
Author

jni commented Sep 3, 2024

@joshmoore

Thinking about a future compatibility matrix (see #59) would it make sense to define designations for the support provided by implementations?

I don't understand this point.

I think I've addressed all other points in my recent commits. To summarise:

  • clarify the RFC
  • change "reviewers" to "proposed reviewers"
  • propose guidance (SHOULD) to maintain TZYX as the main spatiotemporal axes if they make sense for the dataset.
  • explain that partial implementations are ok and it's up to them to decide how to handle "weird" datasets
  • various minor wording updates

@jni
Copy link
Author

jni commented Sep 3, 2024

Overall, to recap my last message, I think this bit:

My reading of the discussion above is "loosen restrictions, but offer guidance with SHOULD as to how tczyx SHOULD be stored".

is the most important part of the discussion that I think lets the proposal move forward, by providing the flexibility that is clearly needed by many relevant parties, while assuaging concerns about unnecessary fragmentation. (ie I think most datasets that folks will want to deal with will still have tzyx in some order.)

I have tried to capture this idea in the most recent revision. Sorry about the radio silence in the meantime. 😅 But I hope we can move this forward now and get on with implementations!

@joshmoore
Copy link
Member

Thinking about a future compatibility matrix (see #59) would it make sense to define designations for the support provided by implementations?

I don't understand this point.

i.e. that we introduce nomenclature for what is and what is not supported. Sorry, that's outside your PR specifically. In this case perhaps it suffices to use "RFC-3 supported" but that may eventually become unwieldy in which case we could have "profiles" or sets of features which can be clearly advertised by libraries and clients.

@jni
Copy link
Author

jni commented Sep 5, 2024

What's the next step here @joshmoore? Do we merge and then enter review? How are reviewers decided and how much review is "enough" review?

@joshmoore
Copy link
Member

  • If we're out of the "clarification" phase, then yes, we get this merged and move to reviews.
  • If you have suggestions on reviewers, let me know. Otherwise, I'll find 'em.
  • We'll know "enough" when we see it. 😉

@joshmoore
Copy link
Member

Thanks for the update, @jni! ❤️ A few TODOs from my side (or even for me) before merging:

@jni
Copy link
Author

jni commented Oct 31, 2024

review nomenclature (OME-NGFF, NGFF, OME-Zarr, etc.)

what's the latest recommendation here?

the updated roadmap

link?

@joshmoore
Copy link
Member

what's the latest recommendation here?

The format is OME-Zarr. The process/community is NGFF. I'm happy to read through and take care of that though.

link?

Doesn't yet exist. It should really be a RFC in its own right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.