-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt freq in training #1407
Adapt freq in training #1407
Conversation
…pt_freq_in_training
…pt_freq_in_training
Just to share some motivations for this change: Adapt_freq modifies
Once this is done, we could re-apply the function and expect nothing to happen, since the frequency is already fixed.
This is not what happens when there is a window dimension: group = sdba.Grouper.from_kwargs(**{"group": "time.dayofyear", "window": 31})["group"]
adapt_freq = {"thresh":"1e-6 kg m-2"}
adapt_freq.setdefault("group", group)
hist, pth, dP01 = sdba.processing.adapt_freq(ref, hist, **adapt_freq)
hist, pth, dP02 = sdba.processing.adapt_freq(ref, hist, **adapt_freq) The problem is that we still have values where This happens because we don't keep the One possible fix is to simply do frequency adaptation with the window argument. This ensures that each doy is well adapted, so each block will be too. This result in potentially more modifications of the datasets on smaller samples. The other fix (proposed here) is to keep the |
xclim/sdba/_processing.py
Outdated
if len(dim) > 1: | ||
temp_dim = get_temp_dimname(sim.dims, "temp") | ||
rank = ( | ||
sim.stack(**{temp_dim: dim}) | ||
.rank(temp_dim, pct=True) | ||
.unstack(temp_dim) | ||
.transpose(*sim.dims) | ||
.drop_vars([dim_ for dim_ in dim if dim_ not in sim.coords]) | ||
) | ||
else: | ||
rank = sim.rank(dim[0], pct=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I incorporate multi-dim ranking in the sdba.rank
function?
I'm thinking :
def rank(...
dim : str | list[str] = "time",
...
):
dim = dim if isinstance(dim,list) else [dim]
# replace rnk = da.rank(dim, pct=pct) with code above (using pct=pct)
# rest of the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea.
…pt_freq_in_training
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks a lot!
And sorry for not reviewing earlier, it slipped through my radar and under the cracks.
xclim/sdba/_processing.py
Outdated
if len(dim) > 1: | ||
temp_dim = get_temp_dimname(sim.dims, "temp") | ||
rank = ( | ||
sim.stack(**{temp_dim: dim}) | ||
.rank(temp_dim, pct=True) | ||
.unstack(temp_dim) | ||
.transpose(*sim.dims) | ||
.drop_vars([dim_ for dim_ in dim if dim_ not in sim.coords]) | ||
) | ||
else: | ||
rank = sim.rank(dim[0], pct=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea.
…into adapt_freq_in_training
remove merge artefacts
Pull Request Checklist:
number
) and pull request (:pull:number
) has been addedWhat kind of change does this PR introduce?
Does this PR introduce a breaking change?
Yes. The adaptation "in-training" is simply a new feature. But the adaptation "out-of-training" (or "modular" approach), akin to the old procedure, will also yield different results. This is because a multi-dimensional rank is now used in "_processing.py".
Other information:
I explain the motivation behind this PR (which is also explained below).
Using multi-dimensional rank changes the "out-of-training" adaptation (which already is "more" in lined with the expected scientific computation). But these results are "more" in-line with the scientific computation expected. I say "more" in-line because if you really want the expected results, the adaptation should be done "in-training" so that each map block remains correctly adapted during the training. Otherwise, with the "out-of-training" adaptation we used to do: expand your data in blocks; frequency adapt them; de-expand to original data size; re-expand in blocks; train them. In the re-expansion, you could still have blocks that are not well adapted.