Releases: lucasvr/hdf5-udf
HDF5-UDF 2.1
Here comes the first release of User-Defined Functions for HDF5 supporting Windows and macOS platforms!
Windows is a tricky platform to support, as there are just so many ways to install the runtime libraries and compilers needed to create UDFs. The way we do it is by resorting to MINGW64, which provides GCC, Python (+PIP), and LuaJIT packages ready to go. The graphical installer takes care of all the necessary steps to get HDF5-UDF up and running for you. The macOS port does not feature a graphical installer, as it's straightforward to compile the project from its source code on that OS.
The missing piece for the Windows build (and for the macOS port) is sandboxing of the UDF process. Please do get in touch if you'd like to volunteer on writing such a feature, as that's not on the top of my priority list.
HDF5-UDF 2.0
This is a new major release of User-Defined Functions for HDF5! Here are some of the good stuff this version brings to you:
- UDF signing: we now extract and associate the public key behind a given UDF and let you associate that key to a trust profile that limits which system calls UDFs signed by that key can execute (and which file system paths it can access)
- UDF library: HDF5-UDF is now available as a library and comes with Python bindings so you can create UDFs using Jupyter Notebooks and regular Python scripts
- Source code storage: it's now possible to include the UDF source code in the target file so the UDF can be modified and recompiled in the future
- New build system based on Meson + Ninja that works like a charm
The project has now a homepage that documents HDF5-UDF's API and its configuration, too. Do check it out!
Please refer to the installation page for details on how to build it from the source code or to install it from binary packages.
HDF5-UDF 1.2
We're happy to announce the availability of HDF5-UDF version 1.2, which comes with exciting new features over its predecessor:
- Support for outputting string datatypes with a user-configurable size
- Support for outputting compound datatypes, including string elements and native datatypes
- Use 1-based indexing on the Lua API to conform with the language best practices
- Allow reading from CSV files, as long as they are not symlinks to other objects in the filesystem and that they're placed in the same directory as the HDF5 file
The most interesting aspect of outputting compounds and strings is that it's now possible to translate CSV to HDF5 on-the-fly.
Compound datasets are described using a variation of the original command-line syntax:
name:{member:type[,member:type...]}:resolution
For instance, an array with 1000 measurements of a generic sensor could be described as:
SensorData:{timestamp:int64,value:float}:1000
String elements (as well as string datasets) use the string
datatype. By default HDF5-UDF allows up to 32 characters for that datatype. That can be overridden using the (N)
modifier, as in string(48)
, which allows string elements to have up to 48 characters.
Please refer to the updated documentation for further details on how to take the best out of this release.
Instructions on how to install the package in source code form or on Debian, Ubuntu, and Fedora derivatives are available here.
HDF5-UDF 1.1
Here comes the second release of HDF5-UDF! The major improvements over version 1.0 are:
- Support for input datasets with variable- and fixed-sized string datatypes
- Support for input datasets with a compound datatype
- Ability to handle Python bytecode from versions 3.1 -- 3.8
- C/C++ backend: enable building UDFs with Clang and link to the math library by default
- Python backend: fixed bug when
lib.getDims()
was called on an input dataset
Please refer to the installation guide for instructions on how to install this package from source code or binary form.
Notes
The introduction of strings and compounds is enabled by the automatic conversion of the corresponding datasets into named C structures that can be used to iterate over the input data. That conversion process takes care of renaming compound member names if needed, as they have to be valid C names. For instance, the following compound:
GROUP "/" {
DATASET "Dataset1" {
DATATYPE H5T_COMPOUND {
H5T_STD_I64LE "Serial number";
H5T_IEEE_F64LE "Temperature (F)";
H5T_IEEE_F64LE "Pressure (inHg)";
}
DATASPACE SIMPLE { ( 4 ) / ( 4 ) }
}
}
is converted into this named structure:
struct dataset1_t {
int64_t serial_number;
double temperature;
double pressure;
};
Moreover, if the compound memory layout differs from the storage layout, then hdf5-udf
produces a named structure that contains proper padding so that the UDF can still access the structure members with no need to resort to memory offsetting tweaks.
A similar arrangement is used to store string-based datasets. For instance, a dataset named Foo
declared as a string with a fixed size of 32 bytes would be converted into the following structure for easier iteration using simple pointer arithmetic (again, padding is properly introduced into the structure definition if needed):
struct foo_t {
char value[32];
};
Variable-sized strings have a simple pointer to the actual memory region where the string is allocated. Once again, padding will be automatically inserted into the generated structure if the memory layout differs from the disk layout:
struct foo_t {
char *value;
};
For convenience, a new high-level API lib.string()
can be used so that UDFs don't have to explicitly access the value
member of that structure.
Examples
Please visit the examples and test directories for examples on how to write UDFs that take input from compounds and strings.
HDF5-UDF 1.0
This is the first release of HDF5-UDF! It features support for user-defined functions written in Python, C/C++ and Lua, and the ability to run UDFs in a sandboxed environment.
Please refer to the installation guide for instructions on how to install this package from source code or binary form.
First alpha snapshot
This is the first alpha snapshot of HDF5-UDF. It is already feature complete, but I'd like to work on a few minor issues before calling it final.