Larger-than-memory datasets with iris-esmf-regrid and dask #310

dennissergeev · 2023-10-09T11:15:36Z

📰 Custom Issue

I was wondering if you have any recommendations on what dask settings I should use if I want to regrid a larger-than-memory dataset using iris-esmf-regrid.

The dataset is from an LFRic C24 run, containing about ten 2D or 3D variables with 1000 time slices loaded from 100 files (i.e. 10 time slices per file). The data are chunked accordingly: 100 chunks for every variable. The total size amounts to about 16G on disk.

I can obviously process this in a file-by-file loop but I hope to load the whole dataset and apply regridding in one go. Currently, my script halts because the regridding step consumes all available RAM. I understand this might be asking for too much specialised help but any advice would be highly appreciated!

Machine specs

                 OS : Linux
             CPU(s) : 8
            Machine : x86_64
       Architecture : 64bit
                RAM : 31.2 GiB
        Environment : Jupyter
        File system : ext4
         GPU Vendor : Intel
       GPU Renderer : Mesa Intel(R) UHD Graphics 620 (KBL GT2)
        GPU Version : 4.6 (Core Profile) Mesa 23.0.4-0ubuntu1~22.04.1

  Python 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-07T06:18:04Z

@SciTools-incubator/esmf-regrid-devs This issue is stale due to a lack of activity in the last 180 days. Remove stale label or comment, otherwise this issue will close automatically in 14 days time.

github-actions · 2024-04-21T06:20:28Z

@SciTools-incubator/esmf-regrid-devs This stale issue has been automatically closed due to no community activity

dennissergeev · 2024-04-22T09:33:26Z

This is still relevant, so I would like this issue to be reopened. Thanks.

trexfeathers · 2024-04-23T08:44:53Z

@stephenworsley @pp-mo @HGWright any thoughts?

stephenworsley · 2024-04-23T13:00:02Z

I would have expected that if it's possible to do regridding file by file, dask should be able to handle the same data as a single dask array (with each file being a chunk) so I'm not entirely sure what could be causing problems. I wonder if dask is trying to run too many tasks in parallel, does this still blow memory if you are using a single threaded dask scheduler? For what it's worth, I've recently fixed a bug with the old way we were invoking dask #338, though I suspect this may be a seperate issue, it may be worth checking if the latest (unreleased) version of iris-esmf-regrid works any better. Otherwise, I'd be interested to see a sample of code which blows memory compared to a sample of code for which doing regridding file by file does not blow memory.

github-actions · 2024-10-21T06:07:54Z

@SciTools/esmf-regrid-devs This issue is stale due to a lack of activity in the last 180 days. Remove stale label or comment, otherwise this issue will close automatically in 14 days time.

dennissergeev added the New: Issue Highlight a new community raised "generic" issue label Oct 9, 2023

scitools-ci bot added this to 🚴 Peloton Oct 11, 2023

scitools-ci bot removed this from 🚴 Peloton Dec 15, 2023

scitools-ci bot added this to 🚴 Peloton Dec 15, 2023

github-actions bot added the Stale: Closure warning This stale issue or pull-request has been marked for closure label Apr 7, 2024

github-actions bot added the Stale: Closed This stale issue or pull-request has been closed due to no activity label Apr 21, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2024

github-project-automation bot moved this to Done in 🚴 Peloton Apr 21, 2024

trexfeathers reopened this Apr 23, 2024

trexfeathers removed Stale: Closure warning This stale issue or pull-request has been marked for closure Stale: Closed This stale issue or pull-request has been closed due to no activity labels Apr 23, 2024

stephenworsley added this to iris-esmf-regrid Performance Sep 3, 2024

stephenworsley moved this to Todo in iris-esmf-regrid Performance Sep 3, 2024

stephenworsley added this to iris-esmf-regrid Performance Oct 3, 2024

stephenworsley moved this to Todo in iris-esmf-regrid Performance Oct 3, 2024

github-actions bot added the Stale: Closure warning This stale issue or pull-request has been marked for closure label Oct 21, 2024

ESadek-MO removed the Stale: Closure warning This stale issue or pull-request has been marked for closure label Oct 23, 2024

stephenworsley mentioned this issue Nov 4, 2024

Regridder Partitioning #427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Larger-than-memory datasets with iris-esmf-regrid and dask #310

Larger-than-memory datasets with iris-esmf-regrid and dask #310

dennissergeev commented Oct 9, 2023

github-actions bot commented Apr 7, 2024

github-actions bot commented Apr 21, 2024

dennissergeev commented Apr 22, 2024

trexfeathers commented Apr 23, 2024

stephenworsley commented Apr 23, 2024

github-actions bot commented Oct 21, 2024

Larger-than-memory datasets with iris-esmf-regrid and dask #310

Larger-than-memory datasets with iris-esmf-regrid and dask #310

Comments

dennissergeev commented Oct 9, 2023

📰 Custom Issue

github-actions bot commented Apr 7, 2024

github-actions bot commented Apr 21, 2024

dennissergeev commented Apr 22, 2024

trexfeathers commented Apr 23, 2024

stephenworsley commented Apr 23, 2024

github-actions bot commented Oct 21, 2024