A post-processing solution #7

clbarnes · 2022-05-27T18:33:01Z

clbarnes
May 27, 2022
Maintainer

I split my post-acquisition conversion prototype into 2 repos:

https://github.com/clbarnes/jeiss-specs contains TSV files describing the header for all versions, as well as the enums, and a list of publicly-accessible URLs to example files (so far only for v8, as it's all we've got).
https://github.com/clbarnes/jeiss-convert has jeiss-specs as a submodule and provides the console scripts dat2hdf5 and dat2hdf-verify. The first converts to hdf5, without any userblock or external file shenanigans, splitting channels into datasets, turning all the metadata fields into hdf5 attributes, optionally adding chunking and compression, and allowing you to select channels. The latter generates the bytes of a .dat file from the HDF5 and takes their md5sum, comparing it to the md5sum of an actual dat file, and returns a status code of 1 if they're different. This should work regardless of chunking and compression (although it obviously won't if you've dropped a channel) and optionally deletes the dat if successful.

The converter stores both the raw header and the footer as byte arrays in the hdf5 attributes. When verifying, the footer is just added blindly to the end, but the header is re-serialised from the metadata split across hdf5 attributes (no cheating!).

New header versions just need a new TSV in jeiss-specs and nothing should need to change in jeiss-convert other than bumping the version of the submodule.

I'd plan to use this when moving data off the acquisition machine and onto the primary storage server.

clbarnes · 2022-05-27T19:14:55Z

clbarnes
May 27, 2022
Maintainer Author

On second thoughts, I've removed the channel-dropping feature. The intention is for this to be a lossless and verifiable process, which is not the case if you drop a channel. As channels can be disabled at acquisition time, that seems a more true way of doing it.

0 replies

mkitti · 2022-05-27T19:24:45Z

mkitti
May 27, 2022
Maintainer

Could you create a hdf2dat command line utility? Basically just write out hdf5_to_bytes(parsed.hdf5, parsed.group)

8 replies

d-v-b May 28, 2022
Maintainer

well, to be more precise, the only extant tools that consume DAT files exist to store the data in a standard format. So I think there would be no users for a tool that converts from a standard format into DAT.

mkitti May 28, 2022
Maintainer

Nonetheless, I understand that @avweigel still would like to see an actual DAT file on disk created from an equivalent HDF5 file.

clbarnes May 31, 2022
Maintainer Author

I have added this feature, with the caveat that it's slightly annoying to use.

It is part of the -verify tool, implying that it should only be used for verifying HDF5 files, not for creating DATs
It doesn't have a short option
It prompts the user for confirmation (I considered a random maths problem but thought that would be a bit too far...)

clbarnes May 31, 2022
Maintainer Author

Additionally I added a flag for checking that the bytes are literally identical rather than just the hashes, in case that helps settle any qualms.

mkitti May 31, 2022
Maintainer

Thank you. Yes, I hope this is only necessary during a transition period.

mkitti · 2022-05-27T21:40:04Z

mkitti
May 27, 2022
Maintainer

I added an issue gathering information about HDF5 (compression) filters here:
clbarnes/jeiss-convert#1

0 replies

mkitti · 2022-05-27T21:53:58Z

mkitti
May 27, 2022
Maintainer

Overall, thank you for this. This looks very nice. @trautmane was just about to start working on this.

0 replies

mkitti · 2022-05-31T10:26:59Z

mkitti
May 31, 2022
Maintainer

I am curious about the layout of the datasets without chunking or compression. h5ls -va file.h5 should be able to produce this. Alternatively, one would just need to run https://portal.hdfgroup.org/display/HDF5/H5D_GET_OFFSET on each of the datasets.

The attributes might be scattered throughout the file in 2KB chunks. After h5py/h5py#2106 we should be able to consolidate these by using a larger meta_block_size. Once the attributes and metadata are consolidated, I think we should be able to achieve a format which is a HDF5 meta data block followed by the DAT file verbatim due the inclusion of the _header and _footer as a datasets.

0 replies

mkitti · 2022-05-31T10:38:24Z

mkitti
May 31, 2022
Maintainer

Incidentally, the HDF5 User Group Europe meeting is happening at the moment:
https://www.hdfgroup.org/hug/europeanhug22/agenda/

1 reply

mkitti May 31, 2022
Maintainer

It's running late, so the Lima talk will probably be after lunch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A post-processing solution #7

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

A post-processing solution #7

clbarnes May 27, 2022 Maintainer

Replies: 6 comments · 9 replies

clbarnes May 27, 2022 Maintainer Author

mkitti May 27, 2022 Maintainer

d-v-b May 28, 2022 Maintainer

mkitti May 28, 2022 Maintainer

clbarnes May 31, 2022 Maintainer Author

clbarnes May 31, 2022 Maintainer Author

mkitti May 31, 2022 Maintainer

mkitti May 27, 2022 Maintainer

mkitti May 27, 2022 Maintainer

mkitti May 31, 2022 Maintainer

mkitti May 31, 2022 Maintainer

mkitti May 31, 2022 Maintainer

clbarnes
May 27, 2022
Maintainer

Replies: 6 comments 9 replies

clbarnes
May 27, 2022
Maintainer Author

mkitti
May 27, 2022
Maintainer

d-v-b May 28, 2022
Maintainer

mkitti May 28, 2022
Maintainer

clbarnes May 31, 2022
Maintainer Author

clbarnes May 31, 2022
Maintainer Author

mkitti May 31, 2022
Maintainer

mkitti
May 27, 2022
Maintainer

mkitti
May 27, 2022
Maintainer

mkitti
May 31, 2022
Maintainer

mkitti
May 31, 2022
Maintainer

mkitti May 31, 2022
Maintainer