Skip to content

A tutorial for new and intermediate HDF5 users (of all ages)

Notifications You must be signed in to change notification settings

HDFGroup/hdf5-tutorial

Repository files navigation

Welcome to the HDF5 Tutorial

HDF5 logo

Open in GitHub Codespaces

👉 Check out this intro on YouTube.

HDF5 is a framework for sharing data. Its purpose is to frame data so that it can be shared and understood on its own because it speaks for itself; it is self-describing. Sharing comes in different forms, for example, via files formatted in a standard way, or by providing access to data through a service endpoints. It is important that these "appearances" cohere and are uniform. For that, the framework includes a data model that is common to all implementations, including the HDF5 library and file format or the Highly Scalable Data Service (HSDS). What makes it all work are fantastic community contributions, such as h5py, HDFql, rhdf5, or H5Web.

This is a tutorial for new and intermediate HDF5 users. The tutorial is organized as a set of Jupyter notebooks that are available for download in this GitHub repository. The notebooks are intended to be reviewed in the numbered order, but they are self-contained, and can be visited in any order.

The notebooks in this tutorial are:

  1. C/C++ 101 - In case you've never written a C/C++ program
  2. Theme - A model problem that we will use throughout the tutorial
  3. Variation 1 - A slightly more complex variant of the problem
  4. Variation 2 - MPI-parallel HDF5
  5. Python Bliss - Where most HDF5 users will spend their time
  6. HSDS - How to tackle the model problem with HDF5-as-a-Service
  7. REST VOL - Talking to HSDS from the HDF5 library
  8. S3 & Cloud - Reading data directly from the Cloud
  9. PureHDF - How to tackle the model problem in C#
  10. HDFql - The easy way to manage HDF5 data

This tutorial is intended to be executed in a Web browser without installing any software. To this end, this repository comes with a prebuild GitHub Codespaces configuration. Launch a Codespaces environment by clicking the banner "Open in GitHub Codespaces" and start evaluating the Jupyter notebooks (by placing the cursor into a code cell and pressing Ctrl+Enter or Shift+Enter). When prompted for a Python kernel, select

hdf5-tutorial (Python 3.12.1) /opt/conda/envs/hdf5-tutorial/python

You are welcome to clone this repository and set up your local environment. Notice that a few settings are specific to the Codespaces configuration and will need adjusting in a custom environment. We recommend you review environment.yml for Python package dependencies, and the files in .devcontainer for additional dependencies.

Enjoy and let us know what you think about the tutorial in the discussion! Please help us improve it by reporting issues or submitting pull requests!

About

A tutorial for new and intermediate HDF5 users (of all ages)

Resources

Stars

Watchers

Forks