bucket resampler get_max, get_min are not optimally parallelised #432

gerritholl · 2022-05-09T13:38:44Z

Despite impressive improvements in #368, there remains room to improve the parallelisation of the BucketResampler methods get_max and get_min. Much of the method is spent in the @dask.delayed-function _get_statistics, which is not parallelised, even though it could be in principle.

Perhaps the resample_blocks function (to be) introduced in #341 could be of help.

Code Sample, a minimal, complete, and verifiable piece of code

"""Test bucket resampler performance."""

from pyresample import create_area_def
from pyresample.bucket import BucketResampler
import dask.array as da
from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler, visualize
import numpy as np
from math import prod
import xarray as xr

src_area = create_area_def(
        "src", 4087, area_extent=[-3_200_000, -3_200_000, 3_200_000, 3_200_000], resolution=1000)

dest_area = create_area_def(
        "dest", 4087, area_extent=[-3_000_000, -3_000_000, 3_000_000, 3_000_000], resolution=5000)

data = xr.DataArray(
       da.arange(prod(src_area.shape)).reshape(src_area.shape)-10000,
       dims=("y", "x"),
       attrs={"area": src_area})

resampler = BucketResampler(dest_area, *src_area.get_lonlats(chunks=1000))

m = resampler.get_max(data)

with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof:
    dask.compute(m)
visualize([prof, rprof], show=False, save=True, filename="br-max-dask.html")

Problem description

The bucket resampler methods get_max and get_min are spending most of the wall clock time in unparallelised code. This means the run takes longer than it needs to.

Expected Output

I expect a dask visualisation that illustrates that 800% CPU is used 100% of the time.

Actual Result, Traceback if applicable

In reality, 800% CPU is used less than 40% of the time. More than 60% of the time is spent in the task _get_statistics, which is exactly the @dask.delayed-decorated function used to calculate the maximum:

Versions of Python, package at hand and relevant dependencies

pyresample main: v1.23.0-46-g0cb8914

The text was updated successfully, but these errors were encountered:

djhoese · 2022-05-09T14:22:35Z

The key difficulty in the algorithm, as much as I understand of it at least, is that is needs to do an argsort and it does this over the entire input data array. The reason the resample_blocks function might help is that it runs your function for a single sub-array of the input data that will cover the output chunk/block. This is really want the algorithm "wants" in its current implementation, but coding up that logic is complicated in a basic map_blocks call. We can let resample_blocks do that for us in the future.

djhoese added the enhancement label May 9, 2022

gerritholl added the performance improves speed or decreases memory consumption, but does not otherwise change functionality label May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bucket resampler get_max, get_min are not optimally parallelised #432

bucket resampler get_max, get_min are not optimally parallelised #432

gerritholl commented May 9, 2022 •

edited

Loading

djhoese commented May 9, 2022

bucket resampler get_max, get_min are not optimally parallelised #432

bucket resampler get_max, get_min are not optimally parallelised #432

Comments

gerritholl commented May 9, 2022 • edited Loading

Code Sample, a minimal, complete, and verifiable piece of code

Problem description

Expected Output

Actual Result, Traceback if applicable

Versions of Python, package at hand and relevant dependencies

djhoese commented May 9, 2022

gerritholl commented May 9, 2022 •

edited

Loading