Regridder Partitioning #427

stephenworsley · 2024-11-04T16:27:26Z

📰 Custom Issue

There are currently problems with handling data that it too large for memory (#310, #246). One way this could be worked around is, instead of building a single regridder to handle the source and target grid/mesh, build many smaller regridders, each responsible for some section of the source and target grid/mesh. A Partition object (name to be decided), could handle the building, saving and application of such regridders.

The Partition object would have the following functionality:

Can be initiatedby passing a source grid/mesh, a target grid/mesh and some collection of indices which describes the source and target subsets.
- It may be possible in future to automatically determine an appropriate collection of subsets, however this shouldn't be necessary for a minimum viable solution.
- It may also be necessary to pass in explicit information about the dask chunking of the input object (and desired chunking for the output object).
There should be some level of error checking to ensure that the partition makes sense. i.e. The entire source/target is covered, and for each pair of source/target indices, the source points cover the target points.
- It may also be worth checking that the calculation would be managable for the given chunking strategy (and at least raise a warning in such cases).
There should be a method for generating regridders (a regridder from the 1st source indices to the 1st target indices, same for the 2nd etc.) and saving these regridders to a user supplied series of paths. Doing so will mean that only one regridder will need to be realised in memory at a time. The Partition object should be able to keep track of the paths to the appropriate regridders.
- It should also be possible to give a Partition object access to the paths of previously saved regridders, via intialisation or some other method.
When this object has access to a full set of saved regridders, it is able to apply them in order to lazily regrid data from the source grid/mesh as if it were a regular regridder. When a chunk of data is realised, it will then load the appropriate regridders, perform regridding on that chunk and delete the regridder. In this way, only a limited number of regridders will need to be loaded into memory at any one time.
- One problem we may have to solve is figuring out a way to realise multiple chunks in the same vertical stack which will use the same regridder in such a way as to not have to load that regridder multiple times.

The text was updated successfully, but these errors were encountered:

stephenworsley added the New: Issue Highlight a new community raised "generic" issue label Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regridder Partitioning #427

Regridder Partitioning #427

stephenworsley commented Nov 4, 2024 •

edited

Loading

Regridder Partitioning #427

Regridder Partitioning #427

Comments

stephenworsley commented Nov 4, 2024 • edited Loading

📰 Custom Issue

stephenworsley commented Nov 4, 2024 •

edited

Loading