Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditions to match ref.time & hist.time, and also ref.time.size & sim.time.size #1995

Merged
merged 32 commits into from
Dec 9, 2024

Conversation

coxipi
Copy link
Contributor

@coxipi coxipi commented Nov 13, 2024

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes #xyz
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGELOG.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

  • Add a method to bias adjustment classes to raise an error if ref and hist times do not match
  • Add a method to bias adjustment classes to raise an error if ref and sim times do not have the same size
  • To make this last change in dOTC (which was needed), I changed how the nans are handled

Does this PR introduce a breaking change?

I'm not sure, I should check if the previous method was handling nans correctly. I have a suspicion that it was not. But this way is simpler, more in-line with other xclim methods, and technically what we want

Other information:

@github-actions github-actions bot added the sdba Issues concerning the sdba submodule. label Nov 13, 2024
xclim/sdba/adjustment.py Outdated Show resolved Hide resolved
@coxipi
Copy link
Contributor Author

coxipi commented Nov 18, 2024

it keeps saying that some checks were not successful, but I don't see any red x, showing the failures.

@aulemahal
Copy link
Collaborator

Weird, the previous CI shows some runs "cancelled". I don't see why they were cancelled...

@Zeitsperre
Copy link
Collaborator

It's a configuration issue. I'll see what I can do.

@coxipi
Copy link
Contributor Author

coxipi commented Nov 25, 2024

So all this discussion about conflict values of time with map_blocks if times are not maching:

I guess that's why NpdfTransform had:

                "ref": ref.rename(time="time_hist"),
                "hist": hist.rename(time="time_hist"),
                "sim": sim,

I'll try and see if the class Adjust could handle this on its own to also avoid the problem in other classes. I recently had this problem when trying to use dOTC.

For MBCn, I didn't see this with the groupies-like implementation since it doesn't use map_groups. I'm not sure how it worked for my map_groups implementation though , I feel it should have failed for this specific reason above.

@coxipi
Copy link
Contributor Author

coxipi commented Nov 26, 2024

Ready for xclim next version

@Zeitsperre Zeitsperre requested a review from aulemahal November 26, 2024 21:29
Copy link
Collaborator

@aulemahal aulemahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think the replace_sim logic needs a bit more doc, the current docstring does not list "same size same calendar" as a requirement for sim.

xclim/sdba/adjustment.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the approved Approved for additional tests label Nov 27, 2024
@coxipi
Copy link
Contributor Author

coxipi commented Nov 27, 2024

Looks good to me. I think the replace_sim logic needs a bit more doc, the current docstring does not list "same size same calendar" as a requirement for sim.

Hmm, should I just leave this a case-by-case for each specific method? I don't think there is much documentation on _allow_diff_calendars either? Maybe this should be improved too

I guess _allow_diff_calendars and _allow_diff_training_times are explained a little bit more than _replace_sim_time since there is a check in TrainAdjust. But it's just in-line documentation inside the function, so I don't think it's documented a lot more? Maybe I'm missing something

e.g.

        if not cls._allow_diff_calendars and len(calendars) > 1:
            raise ValueError(
                "Inputs are defined on different calendars,"
                f" this is not supported for {cls.__name__} adjustment."
            )

vs.

        # This below implies that ref.time and sim.time have the same size
        # Since `ref,hist, sim` are in the same `map_groups` call, they must have
        # the same time
        if cls._replace_sim_time:
            sim_time = sim.time
            sim["time"] = ref["time"]

maybe I should simply have : _allow_ref_and_sim_time_different_size, and if it's False, there is the check, documented in the same way, and the conversion. This would be clearer?

EDIT: This would already be better, we can anticipate a wrong input, and raise an error if need-be

@aulemahal
Copy link
Collaborator

Héhé I guess you're right about the other having almost no doc. The difference though is that _allow_diff_calendars doesn't actually do something, but the error it raises tells what's wrong. In Adjust, if sim.time is different than ref.time, the error will come from xarray and it might not make sense for the user ?

@coxipi
Copy link
Contributor Author

coxipi commented Nov 27, 2024

Héhé I guess you're right about the other having almost no doc. The difference though is that _allow_diff_calendars doesn't actually do something, but the error it raises tells what's wrong. In Adjust, if sim.time is different than ref.time, the error will come from xarray and it might not make sense for the user ?

Yes good point, that's what I was slowly converging to when thinking about your comment.

Is _allow_ref_and_sim_time_different_size too long? I think it's the way to go, and replacing the times can be done too without any specific attribute

@Zeitsperre
Copy link
Collaborator

FYI, the issues with conda are fixed here: 9aa4e4a

@coxipi
Copy link
Contributor Author

coxipi commented Nov 27, 2024

I can just add _allow_diff_times_size, and for adjust, the check will be for ref,hist,sim which, in any case, this is what we want.

You know, there is an implicit assumption in other function that ref and hist will have the same calendar AND the same time, right? I can add this check in other TrainAdjust class for the train part. But this would be more strict, something like:

_allow_diff_times, i.e. times must be the exact same

@coxipi
Copy link
Contributor Author

coxipi commented Nov 27, 2024

_allow_diff_calendars is a less strict condition than _allow_diff_training_times, and likewise it also applies only on ref and hist currently ... except for Adjust class where it is applied on ref,hist,sim.

Just to say: _allow_diff_calendars = False is mostly useless in TrainAdjust classes if we also check training times, _allow_diff_training_times = False

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added the docs Improvements to documenation label Nov 28, 2024
coxipi and others added 7 commits November 28, 2024 12:15
<!--Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [ ] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #xyz
- [ ] Tests for the changes have been added (for bug fixes / features)
- [ ] (If applicable) Documentation has been added / updated (for bug
fixes / features)
- [ ] CHANGELOG.rst has been updated (with summary of main changes)
- [ ] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added

### What kind of change does this PR introduce?

*nans are now removed and put back in place at a lower level. This
simplifies the handling of times.

### Does this PR introduce a breaking change?

I'm not sure, I should check if the previous method was handling nans
correctly. I have a suspicion that it was not. But this way is simpler,
more in-line with other xclim methods, and technically what we want

### Other information:

The motivation behind this is that we want to be able to use different
time but same size for `ref` and `sim` so they can be passed to a
`map_groups` function. Apparently, the handling of nans was messing with
the relative sizes of `ref` ,`sim`. Before the computation, they were
the same (by construction) and at some intermediary step, they ended up
with different dimensions. Maybe the dimensions were ultimately
restored, but I was nervous about this, and I think this new
implementation is more simple.
@coxipi coxipi changed the title ref & hist times in bias adjustment methods must match Conditions to match ref.time & hist.time, and also ref.time.size & sim.time.size Nov 28, 2024
@coxipi
Copy link
Contributor Author

coxipi commented Nov 28, 2024

@aulemahal I accidentally merged the changes from a child PR in this one 🤡

Thoses changes were prompted because I wanted to make the _allow_diff_time_sizes work with dOTC, so it's related, but the PR is a bit bigger now. Sorry

@aulemahal
Copy link
Collaborator

Ah! I now understand the reaction haha. Were you going to ask me a review on that other PR, is it ready ?

@coxipi
Copy link
Contributor Author

coxipi commented Nov 28, 2024

I had tagged Sarah for this one. I was planning to take a bit more time on this one, but really, it's not all that complicated. There were some weird handling of NaNs previously in dOTC. I have simplified the process (IMO). I only drop nans at the lowest level when it's needed, and put the Nans back in place again at the lower level when it's over

It's really only one file, I think it's reasonable to ask Sarah to focus on this specific part

@coxipi coxipi requested a review from SarahG-579462 November 28, 2024 19:17
@coxipi
Copy link
Contributor Author

coxipi commented Nov 28, 2024

@SarahG-579462 Can you check the changes in _adjustment.py? I meant to have a separate PR for this but I messed up.

The changes concern the handling of nans in dOTC. I was getting some weird errors, I'm not sure the time arrays were appropriately preserved before. I have simplified the handling of Nans and I don't have problems anymore.

Copy link
Contributor

@SarahG-579462 SarahG-579462 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though is it possible to override the default for _allow_diff_training_times and _allow_diff_time_sizes ?

I could imagine situations where you might want to use EmpiricalQuantileMapping with _allow_diff_time_sizes = False but _allow_diff_training_times = True, although it might not actually work without a lot of finicking.

(e.g. correcting your past based on the current conditions, which is a principle applied (manually) in our Portail Ingénieur tool)

@coxipi
Copy link
Contributor Author

coxipi commented Dec 2, 2024

_allow_diff_training_times = True

Currently it would not work because of the map_groups. For the simulation, this is not a problem because you usually it doesn't interact with the times of ref, hist, at the point of adjustment we already have adjustment factors. dOTC is an example where this is not the case, so ref,hist,sim all go in the same map_groups call. So I need to set the times of sim to be equal to those of ref,hist. It seems to me that this is what you are suggesting by wanting _allow_diff_training_times = True ? We would have the same size of time arrays, but different periods? The time array could then be made to match temporarily, like I did for dOTC. I guess this is what you currently do on the Portail Ingénieurs?

I guess something like this could work:

from xclim.sdba import EmpiricalQuantileMapping
EmpiricalQuantileMapping._allow_diff_training_times = True
EmpiricalQuantileMapping._allow_diff_time_sizes = False

and we would need to change the TrainAdjust class

        if cls._allow_diff_training_times and not cls._allow_diff_time_sizes:
            hist_time = hist.time
            hist["time"] = ref["time"]
       # compute compute compute ...
        if cls._allow_diff_training_times and not cls._allow_diff_time_sizes:
            scen["time"] = hist_time

@aulemahal Any big objections on this one? This would be a hidden option, you would need to change default attrs to access this functionnality

@Zeitsperre Zeitsperre added this to the v0.54.0 milestone Dec 3, 2024
@Zeitsperre Zeitsperre added the priority Immediate priority label Dec 3, 2024
@aulemahal
Copy link
Collaborator

No objection here! It is fair that an "experimental" module like sdba allows these things, but does not publicize them much.

@Zeitsperre
Copy link
Collaborator

@coxipi If you want to implement the changes, go for it. We can reschedule the release to next week (will coincide with our regular planning meeting), so no rush!

xclim/sdba/adjustment.py Outdated Show resolved Hide resolved
xclim/sdba/adjustment.py Outdated Show resolved Hide resolved
xclim/sdba/adjustment.py Outdated Show resolved Hide resolved
@Zeitsperre Zeitsperre merged commit 9f30c72 into main Dec 9, 2024
23 checks passed
@Zeitsperre Zeitsperre deleted the check_matching_times branch December 9, 2024 19:44
@coxipi coxipi mentioned this pull request Dec 10, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests CI Automation and Contiunous Integration docs Improvements to documenation priority Immediate priority sdba Issues concerning the sdba submodule.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants