diff --git a/cloud-optimized-netcdf4-hdf5/index.qmd b/cloud-optimized-netcdf4-hdf5/index.qmd index be17021..215e158 100644 --- a/cloud-optimized-netcdf4-hdf5/index.qmd +++ b/cloud-optimized-netcdf4-hdf5/index.qmd @@ -57,7 +57,7 @@ HDF5 file organization—data, metadata, and free space—depends on the file sp Here are a few additional considerations for understanding and implementing the `H5F_FSPACE_STRATEGY_PAGE` strategy: -* **Chunks vs. Pages:** In HDF5, datasets can be chunked, meaning the dataset is divided into smaller blocks of data that can be individually compressed (see also [Chunking in HDF5](https://support.hdfgroup.org/documentation/hdf5-docs/advanced_topics/chunking_in_hdf5.html)). Pages, on the other hand, represent the smallest unit HDF5 uses for reading and writing data. To optimize performance, chunk sizes should ideally align with the page size or be a multiple thereof. A chunk does not have to fit within a single page, however misalignment leads to chunks spanning multiple pages, which increases read latency. Entire pages are read into memory when accessing chunks or metadata. Only the relevant data (e.g., a specific chunk) is decompressed. +* **Chunks vs. Pages:** In HDF5, datasets can be chunked, meaning the dataset is divided into smaller blocks of data that can be individually compressed (see also [Chunking in HDF5](https://support.hdfgroup.org/documentation/hdf5-docs/advanced_topics/chunking_in_hdf5.html)). Pages, on the other hand, represent the smallest unit HDF5 uses for reading and writing data. To optimize performance, chunk sizes should ideally align with the page size or be a multiple thereof. Entire pages are read into memory when accessing chunks or metadata. Only the relevant data (e.g., a specific chunk) is decompressed. * **Page Size Considerations:** The page size applies to both metadata and raw data. Therefore, the chosen page size should strike a balance: it must consolidate metadata efficiently while minimizing unused space in raw data chunks. Excess unused space can significantly increase file size. File size is typically not a concern for I/O performance when accessing parts of a file. However, increased file size can become a concern for storage costs. :::