Is there a way to control the chunksizes during an aligment or reindex operation? #9594
Replies: 2 comments 2 replies
-
Not really. This is a bit hard given the modularity of the stack. Can you tell us what kind of behaviour you would like? If you could control the chunk size, what would you set? cc @phofl |
Beta Was this translation helpful? Give feedback.
-
We do have an option that allows you to set the increase in chunk size you are willing to tolerate on the Dask side. I've recently been thinking a lot if we should just aim for something like auto, i.e. always 128MiB or something similar. But Dask still has a few places where chunk sizes are blowing up accidentally (dask/dask#11426), so I took the path of preserving the input chunks. This is indeed a pattern I hadn't really considered, I'll think about this a bit more |
Beta Was this translation helpful? Give feedback.
-
After the update 2024.08.02 of Dask the vindex method is keeping the chunksizes consistent, which I think can have a negative impact in certain scenarios on operations like reindex, reindex_like, and align on the Xarray side, for example, if we try to align a DataArray that is a small subset of another one, the number of chunks can increase drastically (see the example for a more clear overview), so my question is if there is a way to control this behavior to prevent the generation of graphs extremely big that affect the performance negatively.
My proposal is that Xarray (not sure if this would be something better on the Dask side but they do not handle indexes as Xarray) could handle the alignment of the chunks in a more "sophisticated" way, it can be through a heuristic that decides the "ideal chunks" of the output, for example, use the biggest chunk of all the arrays as output, and add artificial data before reindexing (probably using the pad method would be ideal) to the small arrays to make them of at least the size of the "ideal chunk", this would guaranty that the number of chunks generated with the operation would be smaller, which would make the performance quite better.
Beta Was this translation helpful? Give feedback.
All reactions