Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OME-NGFF metadata when writing flattened layout #131

Merged

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Feb 15, 2022

See https://forum.image.sc/t/dropping-the-top-layer-in-ome-zarr-data-without-losing-multiscales-metadata/63286

As originally suggested in #108 (comment) and captured in the README, setting the --scale-format-string to %2$d/ allows to flatten the hierarchy created bioformats2raw to match the layout of OME-NGFF multiscale images. Using bioformat2raw 0.4.0, this command results in the absence of the multiscales metadata from the top-level group:

(zarr_dev) [sbesson@pilot-zarr2-dev ~]$ bioformats2raw test.fake test.zarr --scale-format-string '%2$d/' &&  cat test.zarr/.zattrs
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp8729792479941275324/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
{
  "bioformats2raw.layout" : 3
}

After investigation, this is due to the logic in https://github.com/glencoesoftware/bioformats2raw/blob/master/src/main/java/com/glencoesoftware/bioformats2raw/Converter.java#L1483 which fails to handle this particular resolutionString and lead to the following resolution:

2022-02-15 09:50:46,281 [main] DEBUG c.g.bioformats2raw.Converter -   seriesString = 0
2022-02-15 09:50:46,281 [main] DEBUG c.g.bioformats2raw.Converter -   resolutionString = 0/

instead of

2022-02-15 10:30:22,690 [main] DEBUG c.g.bioformats2raw.Converter -   seriesString = 
2022-02-15 10:30:22,690 [main] DEBUG c.g.bioformats2raw.Converter -   resolutionString = 0

As noted on the forum post, using the current release of the tool, dropping the trailing slash produces a different output which preservesx the top-level Zarr/OME-NGFF metadata.

(zarr_dev) [sbesson@pilot-zarr2-dev ~]$ bioformats2raw test.fake test2.zarr --scale-format-string '%2$d' &&  cat test2.zarr/.zattrs
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp5978412340746156458/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
{
  "multiscales" : [
    {
      "metadata" : {
        "method" : "loci.common.image.SimpleImageScaler",
        "version" : "Bio-Formats 6.8.0"
      },
      "datasets" : [
        {
          "path" : "0"
        },
        {
          "path" : "1"
        }
      ],
      "version" : "0.2"
    }
  ]
}

However, this value conflicts with the recommendation at the bottom of https://github.com/glencoesoftware/bioformats2raw/tree/v0.4.0#--scale-format-string.

At minimum 8fb2637 expands the existing unit test (without the trailing slash) to capture the current expectation for the second command and check the presence of a Zarr group with multiscales metadata.

This PR will need at least one extra commit but there are two options:

  • either %2$d becomes the recommended format for this layout, update the README and review the warning about trailing slashes in the README
  • or Converter will need updates to match the data generated by %2$d/ and %2$d

@melissalinkert @chris-allan let me know what you are happy to support moving forward and I can push updates accordingly

@melissalinkert
Copy link
Member

Thanks @sbesson. I think the second option plus README updates makes most sense long-term.

@sbesson
Copy link
Member Author

sbesson commented Feb 15, 2022

👍 regarding the warning in the readme, do you roughly remember under which conditions the absence of trailing slash in the scale format string would cause an error ?

@sbesson
Copy link
Member Author

sbesson commented Feb 15, 2022

Also one related question regarding the semantics of bioformats2raw.layout. My assumption is that it should match the default hierarchy created by bioformats2raw. So in case --scale-format-string '%2$d/' is passed, this key/value can safely be omitted from the top-level group metadata? The alternative would be to also store it alongside the multiscales metadata.

@melissalinkert
Copy link
Member

Kind of wishing I'd been more descriptive in ef95be9. I think the context was running the commands in #108 and #109?

bioformats2raw.layout is just a version number; raw2ometiff will complain if it is missing, but it should complain no matter what if --scale-format-string is used. I don't think it's a problem to just omit it.

@sbesson
Copy link
Member Author

sbesson commented Feb 21, 2022

Pushed two commits adding coverage for both forms of the scale format string in the tests and handling the case of trailing slash.

@joshmoore
Copy link
Contributor

Likely outwit this PR, I wonder if a short-cut for the flattening would be in scope.

@melissalinkert
Copy link
Member

Last 2 commits are fine with me. Extra option captured as #132. Think this just needs final review and merge from @chris-allan.

@chris-allan chris-allan merged commit 37cde33 into glencoesoftware:master Feb 25, 2022
@sbesson sbesson deleted the singledirectory_multiscales branch February 25, 2022 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants