Document h5 files #7

elcorto · 2024-02-19T18:44:59Z

In the main branch, we have two density files:

Be_snapshot0.dens.npy with shape (18, 18, 27, 1)

Be_snapshot.dens.h5 with one dataset /data/0/meshes/Density/0 and shape (18, 18, 27)

$ h5ls -r Be_snapshot.dens.h5
/                        Group
/data                    Group
/data/0                  Group
/data/0/meshes           Group
/data/0/meshes/Density   Group
/data/0/meshes/Density/0 Dataset {18, 18, 27}

The code below compares the data and finds that the arrays are the same:

import h5py
from icecream import ic
import numpy as np


# From https://github.com/elcorto/pwtools/blob/master/src/pwtools/io.py
def read_h5(fn):
    fh = h5py.File(fn, mode="r")
    dct = {}

    def get(name, obj):
        if isinstance(obj, h5py.Dataset):
            _name = name if name.startswith("/") else "/" + name
            val = obj[()]
            dct[_name] = obj.asstr()[()] if isinstance(val, bytes) else val

    fh.visititems(get)
    fh.close()
    return dct


d_h5 = read_h5("Be_snapshot.dens.h5")
ic(list(d_h5.keys()))
data_h5 = list(d_h5.values())[0]
ic(data_h5.shape)

data_np = np.load("Be_snapshot0.dens.npy")
ic(data_np.shape)

ic(data_h5.dtype)
ic(data_np.dtype)

# data_np[..., -1] : (18, 18, 27, 1) -> (18, 18, 27)
assert (data_h5 == data_np[..., -1]).all()

which prints

ic| list(d_h5.keys()): ['/data/0/meshes/Density/0']
ic| data_h5.shape: (18, 18, 27)
ic| data_np.shape: (18, 18, 27, 1)
ic| data_h5.dtype: dtype('float64')
ic| data_np.dtype: dtype('float64')

and the assert passes, so appart from the extra dimension in the .npy file, the data is equal.

Should Be_snapshot.dens.h5 be renamed to Be_snapshot0.dens.h5 then?

The text was updated successfully, but these errors were encountered:

franzpoeschel · 2024-02-19T19:11:33Z

The latest release adds experimental support for openPMD to Mala, as a more scalable and more descriptive alternative to pure numpy files. For testing purposes, the repo contains both the "old" and the "new" data representation Be_snapshot.dens.h5, their contents are the same.
openPMD has several builtin ways to represent a series of data snapshots (time steps, checkpoints, iterations, call them whatever you like):

file-based: each iteration is a separate file
group-based: each iteration is a group within the same file
(specific for ADIOS: variable-based, irrelevant here)

As your h5ls output shows, the 0 is encoded within the HDF file, meaning that it is not necessary to encode that information additionally in the filename. File-based encoding can be selected by the user, by setting a filename pattern such as Be_snapshot_%T.h5.
tldr: The numbering of filenames is managed by openPMD and needs not be explicitly hardcoded inside MALA.

elcorto · 2024-02-20T10:52:53Z

Thanks a lot, that was helpful. I have created #8 to document this as best as I could.

elcorto mentioned this issue Feb 20, 2024

Update README: add h5 files, expand array shape explanation #8

Merged

elcorto changed the title ~~Which snapshot represents Be_snapshot.dens.h5?~~ Document h5 files Feb 20, 2024

elcorto closed this as completed in #8 Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document h5 files #7

Document h5 files #7

elcorto commented Feb 19, 2024

franzpoeschel commented Feb 19, 2024

elcorto commented Feb 20, 2024

Document h5 files #7

Document h5 files #7

Comments

elcorto commented Feb 19, 2024

franzpoeschel commented Feb 19, 2024

elcorto commented Feb 20, 2024