-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Larger-than-memory datasets with iris-esmf-regrid and dask #310
Comments
@SciTools-incubator/esmf-regrid-devs This issue is stale due to a lack of activity in the last 180 days. Remove stale label or comment, otherwise this issue will close automatically in 14 days time. |
@SciTools-incubator/esmf-regrid-devs This stale issue has been automatically closed due to no community activity |
This is still relevant, so I would like this issue to be reopened. Thanks. |
@stephenworsley @pp-mo @HGWright any thoughts? |
I would have expected that if it's possible to do regridding file by file, dask should be able to handle the same data as a single dask array (with each file being a chunk) so I'm not entirely sure what could be causing problems. I wonder if dask is trying to run too many tasks in parallel, does this still blow memory if you are using a single threaded dask scheduler? For what it's worth, I've recently fixed a bug with the old way we were invoking dask #338, though I suspect this may be a seperate issue, it may be worth checking if the latest (unreleased) version of iris-esmf-regrid works any better. Otherwise, I'd be interested to see a sample of code which blows memory compared to a sample of code for which doing regridding file by file does not blow memory. |
@SciTools/esmf-regrid-devs This issue is stale due to a lack of activity in the last 180 days. Remove stale label or comment, otherwise this issue will close automatically in 14 days time. |
📰 Custom Issue
I was wondering if you have any recommendations on what
dask
settings I should use if I want to regrid a larger-than-memory dataset usingiris-esmf-regrid
.The dataset is from an LFRic C24 run, containing about ten 2D or 3D variables with 1000 time slices loaded from 100 files (i.e. 10 time slices per file). The data are chunked accordingly: 100 chunks for every variable. The total size amounts to about 16G on disk.
I can obviously process this in a file-by-file loop but I hope to load the whole dataset and apply regridding in one go. Currently, my script halts because the regridding step consumes all available RAM. I understand this might be asking for too much specialised help but any advice would be highly appreciated!
Machine specs
The text was updated successfully, but these errors were encountered: