Odd behaviour when chunks are used along quantiles and time grouper dimensions in SDBA EQM. #1481

isaachcamp · 2023-09-26T23:53:19Z

Generic Issue

xclim version: 0.45.0
xarray version: 2023.8.0
Python version: 3.11.5
Operating System: CentOS Linux release 7.3.1611

Description

I'm attempting to bias correct precipitation data using the EmpiricalQuantileMapping class from the sdba module. The code runs with no problems, however, I noticed there was some strange results. When plotting the long-term monthly accumulated mean precipitation of adjusted data over a few grid cells, the corrected series looks considerably different to the reference data still -- in some cases worse than the raw data.

The transfer function was applied directly back on to the same years that were used in the training stage, so I would expect the adjusted data to look almost identical to the reference data (with some a smoothing from the monthly grouper and 3-month window).

I believe the cause stems from loading the QM.ds from a ZARR file and leaving the chunks along the quantiles and month dimensions unchanged. If I remove all chunks, or leave chunks along only latitude and longitude dimensions, I get the expected behaviour.

I've attached an image of the long-term monthly accumulated mean precipitation for a few grid cells. The yellow and purple lines (chunks along lat-lon and no chunks respectively) perfectly match, and they closely follow the reference values (VCSN, red line). The blue line shows the adjusted values with chunks along quantiles and month dimensions, and the green line shows the raw values.

I can recreate this issue by taking two time series as ref and hist, with the following lines:

group = sdba.Grouper("time.month", window=3)
QM = sdba.EmpiricalQuantileMapping.train(
            ref,
            hist,
            nquantiles=50,
            group=group,
            kind="*"
        )

adj_nochunks = QM.adjust(hist])

QM.ds = QM.ds.chunk({"month": 3, "quantiles": 10})
adj_chunks = QM.adjust(hist)

ref.groupby('time.month').mean().plot(label="ref")
adj_nochunks.groupby('time.month').mean().plot(label="nochunks")
adj_chunks.groupby('time.month').mean().compute().plot(label="chunks")
import matplotlib.pyplot as plt; plt.legend()

I get this plot:

Code of Conduct

I agree to follow this project's Code of Conduct

aulemahal · 2023-09-27T13:50:10Z

Hi @isaachcamp !

Thank you for raising this issue, this slipped through the testing... Sadly, the way sdba is implemented, chunking along month and quantiles is not supported. The bug here is that it didn't raise an error. The adjust method will interpolate along those dimensions, so they need to be complete in the call. As we use xarray's map_blocks to parallelize the method on chunks, those chunks must not split the "reduced" or "interpolated" dimensions.

map_blocks is used in order to minimize the complexity of the dask graph by uniting all the steps in a single task. While this might affect performance negatively for small datasets, the gain for large ones is significant. (At least it was 2 years ago). However, it does limit the possibilities for chunking. The time coordinate must be complete for train and so do the distribution dimensions (quantiles and the grouping dim ('month', 'dayofyear', etc)).

I'll make PR that adds an error with a meaningful message.

isaachcamp · 2023-09-27T20:08:19Z

That's great, thanks @aulemahal!

### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #1481 - [x] Tests for the changes have been added (for bug fixes / features) - [ ] (If applicable) Documentation has been added / updated (for bug fixes / features) - [x] CHANGES.rst has been updated (with summary of main changes) - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added ### What kind of change does this PR introduce? If a `map_blocks`-wrapped function receives input chunked along the dimensions marked as "reduced", it raises an error. This means one cannot chunk the training dataset (`QM.ds`, for example) along the distribution ('quantiles') or group ('month', 'dayofyear') dimensions. Previously, this didn't raise an error, but the adjustment would be done separately for each chunk, yielding incorrect results, silently. ### Does this PR introduce a breaking change? No. ### Other information:

aulemahal mentioned this issue Sep 27, 2023

Raise when reduced dims are chunked in map_blocks #1482

Merged

5 tasks

aulemahal closed this as completed in #1482 Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd behaviour when chunks are used along quantiles and time grouper dimensions in SDBA EQM. #1481

Odd behaviour when chunks are used along quantiles and time grouper dimensions in SDBA EQM. #1481

isaachcamp commented Sep 26, 2023 •

edited by aulemahal

Loading

aulemahal commented Sep 27, 2023

isaachcamp commented Sep 27, 2023

Odd behaviour when chunks are used along quantiles and time grouper dimensions in SDBA EQM. #1481

Odd behaviour when chunks are used along quantiles and time grouper dimensions in SDBA EQM. #1481

Comments

isaachcamp commented Sep 26, 2023 • edited by aulemahal Loading

Generic Issue

Description

Code of Conduct

aulemahal commented Sep 27, 2023

isaachcamp commented Sep 27, 2023

isaachcamp commented Sep 26, 2023 •

edited by aulemahal

Loading