You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Despite impressive improvements in #368, there remains room to improve the parallelisation of the BucketResampler methods get_max and get_min. Much of the method is spent in the @dask.delayed-function _get_statistics, which is not parallelised, even though it could be in principle.
Perhaps the resample_blocks function (to be) introduced in #341 could be of help.
Code Sample, a minimal, complete, and verifiable piece of code
The bucket resampler methods get_max and get_min are spending most of the wall clock time in unparallelised code. This means the run takes longer than it needs to.
Expected Output
I expect a dask visualisation that illustrates that 800% CPU is used 100% of the time.
Actual Result, Traceback if applicable
In reality, 800% CPU is used less than 40% of the time. More than 60% of the time is spent in the task _get_statistics, which is exactly the @dask.delayed-decorated function used to calculate the maximum:
Versions of Python, package at hand and relevant dependencies
pyresample main: v1.23.0-46-g0cb8914
The text was updated successfully, but these errors were encountered:
The key difficulty in the algorithm, as much as I understand of it at least, is that is needs to do an argsort and it does this over the entire input data array. The reason the resample_blocks function might help is that it runs your function for a single sub-array of the input data that will cover the output chunk/block. This is really want the algorithm "wants" in its current implementation, but coding up that logic is complicated in a basic map_blocks call. We can let resample_blocks do that for us in the future.
Despite impressive improvements in #368, there remains room to improve the parallelisation of the
BucketResampler
methodsget_max
andget_min
. Much of the method is spent in the@dask.delayed
-function_get_statistics
, which is not parallelised, even though it could be in principle.Perhaps the
resample_blocks
function (to be) introduced in #341 could be of help.Code Sample, a minimal, complete, and verifiable piece of code
Problem description
The bucket resampler methods
get_max
andget_min
are spending most of the wall clock time in unparallelised code. This means the run takes longer than it needs to.Expected Output
I expect a dask visualisation that illustrates that 800% CPU is used 100% of the time.
Actual Result, Traceback if applicable
In reality, 800% CPU is used less than 40% of the time. More than 60% of the time is spent in the task
_get_statistics
, which is exactly the@dask.delayed
-decorated function used to calculate the maximum:Versions of Python, package at hand and relevant dependencies
pyresample main: v1.23.0-46-g0cb8914
The text was updated successfully, but these errors were encountered: