Adapt freq in training #1407

coxipi · 2023-06-29T14:33:02Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes #xyz
Tests for the changes have been added (for bug fixes / features)
- (If applicable) Documentation has been added / updated (for bug fixes / features)
CHANGES.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

Frequency adaptation should sometimes be performed in the same step as training so that adapted grouped block are preserved for the training step. This PR allows this by invoking frequency adaptation directly in training functions.
nbstripout in pre-commit now removes the metadata.kernelspec completely, which are annoying differences that come up when editing notebooks with different software (I think). This might be a problem if we want to mix different cell codes, e.g. python and R in a single notebook. If our use case one day evolves towards this, we can revisit this aspect.

Does this PR introduce a breaking change?

Yes. The adaptation "in-training" is simply a new feature. But the adaptation "out-of-training" (or "modular" approach), akin to the old procedure, will also yield different results. This is because a multi-dimensional rank is now used in "_processing.py".

Other information:

I explain the motivation behind this PR (which is also explained below).

Using multi-dimensional rank changes the "out-of-training" adaptation (which already is "more" in lined with the expected scientific computation). But these results are "more" in-line with the scientific computation expected. I say "more" in-line because if you really want the expected results, the adaptation should be done "in-training" so that each map block remains correctly adapted during the training. Otherwise, with the "out-of-training" adaptation we used to do: expand your data in blocks; frequency adapt them; de-expand to original data size; re-expand in blocks; train them. In the re-expansion, you could still have blocks that are not well adapted.

…pt_freq_in_training

xclim/sdba/_processing.py

coxipi · 2023-06-29T15:20:35Z

Just to share some motivations for this change:

Adapt_freq modifies hist so that the dry-day frequency is equal to that of ref (P0>0), or if it's already smaller (P0<=0), then do nothing.

adapt_freq: ref, hist -> ref, hist'

Once this is done, we could re-apply the function and expect nothing to happen, since the frequency is already fixed.

adapt_freq: ref, hist' -> ref, hist'

This is not what happens when there is a window dimension:

group = sdba.Grouper.from_kwargs(**{"group": "time.dayofyear", "window": 31})["group"]
adapt_freq = {"thresh":"1e-6 kg m-2"}
adapt_freq.setdefault("group", group)
hist, pth, dP01 = sdba.processing.adapt_freq(ref, hist, **adapt_freq)
hist, pth, dP02 = sdba.processing.adapt_freq(ref, hist, **adapt_freq)

The problem is that we still have values where dP02>0, hence adapt_freq still need to do some fixing in this second iteration.

This happens because we don't keep the window dimension explicitly. The comparison between ref and hist is done with expanded datasets (window, time), but the correction is only done on the reduced dataset with (time). As a result, we can apply some corrections on a given doy, but on the second use of the function, as we re-expand with (window, time), the can still be the same problems as before.

One possible fix is to simply do frequency adaptation with the window argument. This ensures that each doy is well adapted, so each block will be too. This result in potentially more modifications of the datasets on smaller samples.

The other fix (proposed here) is to keep the (window, time) structure explicit, so that each block is adapted. As a result, frequency adaptation is less modular, since we want to call this within a map_blocks of the training functions. There is also perhaps a conceptual flaw: Given that adaptation can work differently on each block, it is not guaranteed that each block is coherent with each other. For example, with a window=31, the doy=3, year=1990 pr value will be used in 31 blocks. The corrected value could differ from block to block. Granted, this is a mathematical trick, and the corrected value will be small values between 0 and the minimum non-zero value in pr. But I think this is still important to highlight.

xclim/sdba/_adjustment.py

coxipi · 2023-06-29T21:22:11Z

xclim/sdba/_processing.py

+        if len(dim) > 1:
+            temp_dim = get_temp_dimname(sim.dims, "temp")
+            rank = (
+                sim.stack(**{temp_dim: dim})
+                .rank(temp_dim, pct=True)
+                .unstack(temp_dim)
+                .transpose(*sim.dims)
+                .drop_vars([dim_ for dim_ in dim if dim_ not in sim.coords])
+            )
+        else:
+            rank = sim.rank(dim[0], pct=True)


Should I incorporate multi-dim ranking in the sdba.rank function?

I'm thinking :

def rank(... dim : str | list[str] = "time", ... ): dim = dim if isinstance(dim,list) else [dim] # replace rnk = da.rank(dim, pct=pct) with code above (using pct=pct) # rest of the function

I think this is a good idea.

…pt_freq_in_training

coxipi · 2023-06-30T17:21:53Z

I should also note: Given the new multi-dimensional rank structure, even if we choose the same individual window value at the end, the processing on sim_ad is different because the ranks are different on this specific window value. Apparently, even without using the internal call to adapt_freq in the training function, the more modular call to adapt_freq now does a better job at creating a new sim_ad whose adaptation persists if we re-perform the adapt_freq routine:

group = sdba.Grouper("time.dayofyear", window=31)
sim_ad, pth, dP0        = sdba.processing.adapt_freq(prref, prsim, thresh="1 mm d-1", group=group)
sim_ad, pth, dP0_repeat = sdba.processing.adapt_freq(prref, sim_ad, thresh="1 mm d-1", group=group)

I can "sort of" understand why this the case: Take the adaptation of doy=15, with a window=31. As you perform the adaptation on the block centered on doy=15, you will really change the values for doy=15, but the multi-dimensional aspect of the new rank structure make this correction aware of neighbouring doys involved in the same block. But, conversely, doy=15 should be involved in neighbouring blocks as well. I'm confused why this works so well, I would really expect that you generally can't have a modified dataset with dimension (time) that, once re-expanded in (time, window), maintains dP0<=0 in every block. But, as I said, having the rank working on (time, window) surely helps.

At any rate, it is not exactly the same. I compared the "# 2nd try with adapt_freq" cell in "sdba.ipynb" with, "in-training approach" vs. "modular approach" with time.dayofyear and window=31, and there are some small, not-so-frequent differences comparing the scen_ad

This is case where no adapt_freq outright fails with singular peaks. In the example time.month shown in current notebooks, I don't see such differences as above.

review-notebook-app · 2023-08-02T17:01:12Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

aulemahal

Nice, thanks a lot!

And sorry for not reviewing earlier, it slipped through my radar and under the cracks.

xclim/sdba/_adjustment.py

aulemahal · 2023-10-16T18:05:31Z

xclim/sdba/_processing.py

+        if len(dim) > 1:
+            temp_dim = get_temp_dimname(sim.dims, "temp")
+            rank = (
+                sim.stack(**{temp_dim: dim})
+                .rank(temp_dim, pct=True)
+                .unstack(temp_dim)
+                .transpose(*sim.dims)
+                .drop_vars([dim_ for dim_ in dim if dim_ not in sim.coords])
+            )
+        else:
+            rank = sim.rank(dim[0], pct=True)


I think this is a good idea.

xclim/sdba/adjustment.py

…in_training

…into adapt_freq_in_training

CHANGES.rst

remove merge artefacts

…in_training

coxipi added 8 commits April 27, 2023 10:56

Adapt_freq called in training (fix)

52361ae

LOCI has adapt_freq_thresh argument

e096c8f

adapt_freq_thresh option in Scaling

58d5b66

Merge branch 'master' of https://github.com/Ouranosinc/xclim into ada…

d1d96fe

…pt_freq_in_training

Merge branch 'master' of https://github.com/Ouranosinc/xclim into ada…

0471767

…pt_freq_in_training

CHANGES update

1be0d1d

Better description _adapt_freq_s

b304326

more comments

3162c32

github-actions bot added the sdba Issues concerning the sdba submodule. label Jun 29, 2023

coxipi added 3 commits June 29, 2023 10:33

pull number

55ec622

Remove adapt_freq_thresh, useless in LOCI and Scaling

f6ce69b

remove unused import

c95e3c9

coxipi commented Jun 29, 2023

View reviewed changes

xclim/sdba/_processing.py Outdated Show resolved Hide resolved

coxipi commented Jun 29, 2023

View reviewed changes

xclim/sdba/_processing.py Outdated Show resolved Hide resolved

incorporate changes from _adapt_freq_s to _adapt_freq

79bdc32

coxipi commented Jun 29, 2023

View reviewed changes

xclim/sdba/_adjustment.py Outdated Show resolved Hide resolved

coxipi added 2 commits June 29, 2023 17:12

Revert useless formatting changes

e9a64ae

Only used stacked ranking if len(dim)>1

36ef886

coxipi commented Jun 29, 2023

View reviewed changes

Merge branch 'master' of https://github.com/Ouranosinc/xclim into ada…

79f7ca7

…pt_freq_in_training

coxipi added 2 commits August 2, 2023 11:33

Example explaining use of adapt_freq_thresh

2d118cd

move rolling_window example the advanced notebook

341e53d

github-actions bot added the docs Improvements to documenation label Aug 2, 2023

Merge branch 'master' into adapt_freq_in_training

40f07ca

coxipi marked this pull request as ready for review August 2, 2023 21:08

coxipi added the approved Approved for additional tests label Aug 10, 2023

Merge branch 'master' into adapt_freq_in_training

09b3fb8

Zeitsperre added 3 commits August 25, 2023 10:25

Merge branch 'master' into adapt_freq_in_training

e594905

Merge branch 'master' into adapt_freq_in_training

ad43e39

Merge branch 'master' into adapt_freq_in_training

22f4eb8

Zeitsperre requested a review from aulemahal September 5, 2023 15:45

aulemahal approved these changes Oct 16, 2023

View reviewed changes

coxipi added 5 commits October 16, 2023 14:16

Merge branch 'master' of github.com:Ouranosinc/xclim into adapt_freq_…

d3af806

…in_training

Docstrings for adapt_freq_thresh

75eee24

rank now supports multi-dimensional ranking

ff97686

add the modifications of rank (duh)

94cffa2

Merge branch 'adapt_freq_in_training' of github.com:Ouranosinc/xclim …

3e2e8a3

…into adapt_freq_in_training

coxipi commented Oct 16, 2023

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

coxipi and others added 6 commits October 16, 2023 16:19

Update CHANGES.rst

5aa31da

remove merge artefacts

Merge branch 'master' into adapt_freq_in_training

67f7c95

rank variable assignment forgotten

1c84d4a

Merge branch 'master' of github.com:Ouranosinc/xclim into adapt_freq_…

7923083

…in_training

revert changes in test

944b628

clean notebook, strip 'metadata.kernelspec.display_name' in pre-commit

650d118

github-actions bot added the CI Automation and Contiunous Integration label Oct 17, 2023

coxipi added 3 commits October 17, 2023 16:24

really revert change in test

f932a9c

try removing all kernelspec

13e2702

remove kernelspec in nb metadata

2b7debd

coxipi merged commit 316566a into master Oct 18, 2023
12 checks passed

coxipi deleted the adapt_freq_in_training branch October 18, 2023 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt freq in training #1407

Adapt freq in training #1407

coxipi commented Jun 29, 2023 •

edited

Loading

coxipi commented Jun 29, 2023 •

edited

Loading

coxipi Jun 29, 2023 •

edited

Loading

aulemahal Oct 16, 2023

coxipi commented Jun 30, 2023 •

edited

Loading

review-notebook-app bot commented Aug 2, 2023

aulemahal left a comment

aulemahal Oct 16, 2023

Adapt freq in training #1407

Adapt freq in training #1407

Conversation

coxipi commented Jun 29, 2023 • edited Loading

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

coxipi commented Jun 29, 2023 • edited Loading

coxipi Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

aulemahal Oct 16, 2023

Choose a reason for hiding this comment

coxipi commented Jun 30, 2023 • edited Loading

review-notebook-app bot commented Aug 2, 2023

aulemahal left a comment

Choose a reason for hiding this comment

aulemahal Oct 16, 2023

Choose a reason for hiding this comment

coxipi commented Jun 29, 2023 •

edited

Loading

coxipi commented Jun 29, 2023 •

edited

Loading

coxipi Jun 29, 2023 •

edited

Loading

coxipi commented Jun 30, 2023 •

edited

Loading