-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract I/O & storage beyond HDF5 for flexibility, performance, & cloud #59
Comments
This is a great demo of performance profiling and optimization approaches, including I/O. Here's a great blog post by the same author that discusses benchmarking in more detail: https://tomaugspurger.github.io/maintaing-performance.html |
On Nov. 18, @PaulDudaRESPEC committed 0ed2302, which replaced multiple reads of data from storage with a single read into memory and subsequent data access to those in-memory objects. He shared this comment via email:
|
The foundation of this work was completed and tested with: So we'll close this issue. We'll expand on the I/O Abstraction capabilities, including implementing additional storage formats, with new issues. |
This high-level issue pulls together several past, current, and near-future efforts (and more granular issues).
The tight coupling of model input/output (I/O) with the Hierarchical Data Format v5 (HDF5) during the HSP2 runtime limits both performance (see #36) and also interoperability with other data storage formats such as the cloud-optimized Parquet and Zarr storage formats (see Pangeo's Data in the Cloud article) that are tightly coupled with high-performance data structures from foundations PyData libraries Pandas, Dask DataFrames, and Xarray.
Abstracting I/O using a class-based approach would also unlock capabilities for within-tilmestep coupling of HSP2 with other models. Specifically, HSP2 could provide upstream, time-varying boundary conditions for higher-resolutions models of reaches, reservoirs, and the groundwater-surface water interface.
Our overall plan was first outlined and discussed in LimnoTech#27 (Refactor I/O to rely on DataFrames & provide storage options). In brief, we would refactor to:
cc: @PaulDudaRESPEC, @ptomasula
The text was updated successfully, but these errors were encountered: