Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document h5 files #7

Closed
elcorto opened this issue Feb 19, 2024 · 2 comments · Fixed by #8
Closed

Document h5 files #7

elcorto opened this issue Feb 19, 2024 · 2 comments · Fixed by #8

Comments

@elcorto
Copy link
Member

elcorto commented Feb 19, 2024

In the main branch, we have two density files:

  • Be_snapshot0.dens.npy with shape (18, 18, 27, 1)

  • Be_snapshot.dens.h5 with one dataset /data/0/meshes/Density/0 and shape (18, 18, 27)

    $ h5ls -r Be_snapshot.dens.h5
    /                        Group
    /data                    Group
    /data/0                  Group
    /data/0/meshes           Group
    /data/0/meshes/Density   Group
    /data/0/meshes/Density/0 Dataset {18, 18, 27}

The code below compares the data and finds that the arrays are the same:

import h5py
from icecream import ic
import numpy as np


# From https://github.com/elcorto/pwtools/blob/master/src/pwtools/io.py
def read_h5(fn):
    fh = h5py.File(fn, mode="r")
    dct = {}

    def get(name, obj):
        if isinstance(obj, h5py.Dataset):
            _name = name if name.startswith("/") else "/" + name
            val = obj[()]
            dct[_name] = obj.asstr()[()] if isinstance(val, bytes) else val

    fh.visititems(get)
    fh.close()
    return dct


d_h5 = read_h5("Be_snapshot.dens.h5")
ic(list(d_h5.keys()))
data_h5 = list(d_h5.values())[0]
ic(data_h5.shape)

data_np = np.load("Be_snapshot0.dens.npy")
ic(data_np.shape)

ic(data_h5.dtype)
ic(data_np.dtype)

# data_np[..., -1] : (18, 18, 27, 1) -> (18, 18, 27)
assert (data_h5 == data_np[..., -1]).all()

which prints

ic| list(d_h5.keys()): ['/data/0/meshes/Density/0']
ic| data_h5.shape: (18, 18, 27)
ic| data_np.shape: (18, 18, 27, 1)
ic| data_h5.dtype: dtype('float64')
ic| data_np.dtype: dtype('float64')

and the assert passes, so appart from the extra dimension in the .npy file, the data is equal.

Should Be_snapshot.dens.h5 be renamed to Be_snapshot0.dens.h5 then?

@franzpoeschel
Copy link

The latest release adds experimental support for openPMD to Mala, as a more scalable and more descriptive alternative to pure numpy files. For testing purposes, the repo contains both the "old" and the "new" data representation Be_snapshot.dens.h5, their contents are the same.
openPMD has several builtin ways to represent a series of data snapshots (time steps, checkpoints, iterations, call them whatever you like):

  • file-based: each iteration is a separate file
  • group-based: each iteration is a group within the same file
  • (specific for ADIOS: variable-based, irrelevant here)

As your h5ls output shows, the 0 is encoded within the HDF file, meaning that it is not necessary to encode that information additionally in the filename. File-based encoding can be selected by the user, by setting a filename pattern such as Be_snapshot_%T.h5.
tldr: The numbering of filenames is managed by openPMD and needs not be explicitly hardcoded inside MALA.

@elcorto
Copy link
Member Author

elcorto commented Feb 20, 2024

Thanks a lot, that was helpful. I have created #8 to document this as best as I could.

@elcorto elcorto changed the title Which snapshot represents Be_snapshot.dens.h5? Document h5 files Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants