Skip to content

HDF5-UDF 1.1

Compare
Choose a tag to compare
@lucasvr lucasvr released this 04 Jan 19:01

Here comes the second release of HDF5-UDF! The major improvements over version 1.0 are:

  • Support for input datasets with variable- and fixed-sized string datatypes
  • Support for input datasets with a compound datatype
  • Ability to handle Python bytecode from versions 3.1 -- 3.8
  • C/C++ backend: enable building UDFs with Clang and link to the math library by default
  • Python backend: fixed bug when lib.getDims() was called on an input dataset

Please refer to the installation guide for instructions on how to install this package from source code or binary form.

Notes

The introduction of strings and compounds is enabled by the automatic conversion of the corresponding datasets into named C structures that can be used to iterate over the input data. That conversion process takes care of renaming compound member names if needed, as they have to be valid C names. For instance, the following compound:

GROUP "/" {
 DATASET "Dataset1" {
    DATATYPE  H5T_COMPOUND {
       H5T_STD_I64LE "Serial number";
       H5T_IEEE_F64LE "Temperature (F)";
       H5T_IEEE_F64LE "Pressure (inHg)";
    }
    DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
 }
}

is converted into this named structure:

struct dataset1_t {
    int64_t serial_number;
    double temperature;
    double pressure;
};

Moreover, if the compound memory layout differs from the storage layout, then hdf5-udf produces a named structure that contains proper padding so that the UDF can still access the structure members with no need to resort to memory offsetting tweaks.

A similar arrangement is used to store string-based datasets. For instance, a dataset named Foo declared as a string with a fixed size of 32 bytes would be converted into the following structure for easier iteration using simple pointer arithmetic (again, padding is properly introduced into the structure definition if needed):

struct foo_t {
    char value[32];
};

Variable-sized strings have a simple pointer to the actual memory region where the string is allocated. Once again, padding will be automatically inserted into the generated structure if the memory layout differs from the disk layout:

struct foo_t {
   char *value;
};

For convenience, a new high-level API lib.string() can be used so that UDFs don't have to explicitly access the value member of that structure.

Examples

Please visit the examples and test directories for examples on how to write UDFs that take input from compounds and strings.