Skip to content

Commit

Permalink
Remove sentence about chunk sizes bigger than page sizes.
Browse files Browse the repository at this point in the history
  • Loading branch information
abarciauskas-bgse committed Dec 20, 2024
1 parent 80a61a0 commit a879451
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion cloud-optimized-netcdf4-hdf5/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ HDF5 file organization—data, metadata, and free space—depends on the file sp

Here are a few additional considerations for understanding and implementing the `H5F_FSPACE_STRATEGY_PAGE` strategy:

* **Chunks vs. Pages:** In HDF5, datasets can be chunked, meaning the dataset is divided into smaller blocks of data that can be individually compressed (see also [Chunking in HDF5](https://support.hdfgroup.org/documentation/hdf5-docs/advanced_topics/chunking_in_hdf5.html)). Pages, on the other hand, represent the smallest unit HDF5 uses for reading and writing data. To optimize performance, chunk sizes should ideally align with the page size or be a multiple thereof. A chunk does not have to fit within a single page, however misalignment leads to chunks spanning multiple pages, which increases read latency. Entire pages are read into memory when accessing chunks or metadata. Only the relevant data (e.g., a specific chunk) is decompressed.
* **Chunks vs. Pages:** In HDF5, datasets can be chunked, meaning the dataset is divided into smaller blocks of data that can be individually compressed (see also [Chunking in HDF5](https://support.hdfgroup.org/documentation/hdf5-docs/advanced_topics/chunking_in_hdf5.html)). Pages, on the other hand, represent the smallest unit HDF5 uses for reading and writing data. To optimize performance, chunk sizes should ideally align with the page size or be a multiple thereof. Entire pages are read into memory when accessing chunks or metadata. Only the relevant data (e.g., a specific chunk) is decompressed.
* **Page Size Considerations:** The page size applies to both metadata and raw data. Therefore, the chosen page size should strike a balance: it must consolidate metadata efficiently while minimizing unused space in raw data chunks. Excess unused space can significantly increase file size. File size is typically not a concern for I/O performance when accessing parts of a file. However, increased file size can become a concern for storage costs.
:::

Expand Down

0 comments on commit a879451

Please sign in to comment.