Deprecating hardcoded OFFSET constant #1

mkitti · 2022-05-05T17:42:24Z

Proposal: Making `OFFSET`, the start of the array image data, a variable

Currently, .dat readers typically code OFFSET as a fixed constant, 1024. This value encodes the start of the array data and the end of the attribute metadata.

For forward-compatibility, in anticipation of the need for additional metadata to support multiple microscopes, I propose making Offset an independent variable rather than a fixed constant. The variable could either be a directly encoded attribute or can be calculated from other existing attributes.

Proposal: An attribute for `Offset` at offset 992

An independent attribute for Offset would be a robust solution as it can uniquely delineate the separation between metadata and array data. I propose byte offset 992 for the location for unsigned 64-bit integer in big endian format for consistency.

A value of 0x0000000000000000 indicates that the Offset value should be calculated or assumed to be 1024 to enable backwards compatibility.
A value of 0xffffffffffffffff indicates that there is no meaningful Offset for contiguous array data. For example, this may indicate that the array data is chunked and/or compressed.

Proposal: Calculate `Offset` from `FileLength` and other attributes

Alternatively, Offset could be calculated from the FileLength attribute which currently indicates the end of the array data and is stored at byte offset 1000 as a 64-bit big endian integer. Since the length of the array can be calculated as the product of Xresolution, Yresolution, Number_of_channels, and the size of the datatype, the Offset value can be calculated from the FileLength as follows.

Offset = FileLength - Xresolution * Yresolution * Number_of_channels * nbytes_in_datatype

A new special value for FileLength is 0xffffffffffffffff which indicates that the FileLength should be interpreted as the actual end of the file and may not be a reliable value from which to calculate the array data offset. For example, this value should be used if the array data is chunked and/or compressed or the CSV recipe data is no longer indicated in the trailer of the file.

Example application: Proposed hybrid DAT/HDF5 file

If this proposal is implemented, a hybrid DAT/HDF5 becomes possible, where the extra metadata space is used to contained HDF5 metadata according to the HDF5 file metadata.

A simple contiguous HDF5 file can accommodate a userblock of 1 KB or some doubling thereof. This userblock can accomodate the existing DAT file metadata. The HDF5 metadata header can be contained within a subsequent 2 KB by written by the HDF5 library with an early allocation flag. A hybrid-HDF5/dat file could be made if the OFFSET were shifted to 3 KB (3072 bytes).

A potential modification to the LabView writer consists of changing a single constant from 1024 to 3072. The details of the potential writer modification are out of scope for this proposal.

To clarify, the scope of this proposal applies solely to DAT file readers.

Reader References

Python janelia-cosem/fibsem-tools: Verify or calculate offset from FileLength janelia-cellmap/fibsem-tools#25
Java fiji/IO: https://github.com/fiji/IO/blob/3aa1c1a93abf199766c86384ab5cdb968981b841/src/main/java/sc/fiji/io/FIBSEM_Reader.java#L179-L180

The text was updated successfully, but these errors were encountered:

clbarnes · 2022-05-19T12:13:39Z

Writing out an HDF5 file would be enormously helpful and basically solve all implementation problems on the read side. As I understand it, the scope fills up a memory buffer as it goes and then writes out to a file all at the end, which gives quite a lot of latitude in terms of juggling the numbers around before the write (e.g. splitting channels into separate datasets, writing valid metadata as it pertains to both the group and the channels etc.). If giving a flexible offset is the first step, I'm all for it.

mkitti · 2022-05-19T22:43:47Z

Hi @clbarnes,

My understanding is that "2D scan Tclk.vi" is the main image writer component of the software. The top half puts data into a queue, a memory buffer, and the bottom half reads from that queue and writes to disk. However, this happens concurrently, but not synchronously. That is we do not have to wait for data to be written to disk before acquiring new data.

The bottom component has two hard coded values. byte offset 1024 indicates where the image data begins. Byte offset 1000 indicates the end of the image data. After this, the recipe is written.

The current way we have implemented of converting the dat file to a HDF5 file is just using a reader to read in the dat file and then writing out an equivalent HDF5 file, perhaps with the image data chunked and compressed.

An alternative way would be to reopen the file and write in a HDF5 header somewhere. For example, if we moved the DAT header somewhere else, we could overwrite the DAT header with a HDF5 header at the beginning of the file, and then add the attributes to the HDF5 file. The only advantage of this approach is that the image data does need to be rewritten to obtain a HDF5 file. This also could be done during transmission of the file off the acquisition computer.

We will likely proceed with the current method of resaving the entire file in the near term.

-Mark

Zoom in of the bottom file writing component of "2D scan Tclk.vi"

Overview of "2D scan Tclk.vi"

clbarnes · 2022-05-20T07:57:18Z

Got it, thank you! I suppose both cases of rewriting the file or just writing HDF5 metadata into it requires a reader to be co-maintained with the microscope software, which needs to be robust, scaleable, relatively standalone etc.. In which case we may as well just use that tool to do whatever conversion we need, HDF5/zarr/N5/TIFFs/npy/ whatever. The value of having the scope software just generate a valid HDF5 to begin with is that everyone's starting point looks the same, but if that's not an option, it's not an option.

d-v-b · 2022-05-20T15:00:55Z

but if that's not an option, it's not an option.

Is it really not an option though? Is there centralized maintenance of the acquisition software? If not, then someone could go ahead and just add the hdf5 writing functionality and the problem is solved (for that person / group)

mkitti changed the title ~~Deprecating hardcoded OFFSET constant?~~ Deprecating hardcoded OFFSET constant May 5, 2022

mkitti added the question Further information is requested label May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecating hardcoded OFFSET constant #1

Deprecating hardcoded OFFSET constant #1

mkitti commented May 5, 2022 •

edited

Loading

clbarnes commented May 19, 2022

mkitti commented May 19, 2022

clbarnes commented May 20, 2022

d-v-b commented May 20, 2022

Deprecating hardcoded OFFSET constant #1

Deprecating hardcoded OFFSET constant #1

Comments

mkitti commented May 5, 2022 • edited Loading

Proposal: Making OFFSET, the start of the array image data, a variable

Proposal: An attribute for Offset at offset 992

Proposal: Calculate Offset from FileLength and other attributes

Example application: Proposed hybrid DAT/HDF5 file

Reader References

clbarnes commented May 19, 2022

mkitti commented May 19, 2022

clbarnes commented May 20, 2022

d-v-b commented May 20, 2022

mkitti commented May 5, 2022 •

edited

Loading

Proposal: Making `OFFSET`, the start of the array image data, a variable

Proposal: An attribute for `Offset` at offset 992

Proposal: Calculate `Offset` from `FileLength` and other attributes