Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better degree_days_exceedance_date and ensemble stats #1647

Merged
merged 9 commits into from
Feb 16, 2024
Merged

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented Feb 14, 2024

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGES.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

Implements two solutions for #1459.

  • min_members argument to ensemble stats functions. Simply mask elements when there are fewer valid (non-nan) members than this threshold. Similar to min_periods of xarray's rolling, but with a different default value.
  • never_reached argument to degree_days_exceedance_date. A value (int, doy) to assign when the sum_thresh is not reached at the end of the period, to differentiate from missing values.

Does this PR introduce a breaking change?

No. Both default options preserve the previous behaviour (which means min_members doesn't have the same default as it's inspiration min_periods).

Other information:

Both solutions are distinct. For climatedata.ca, I think only min_members would be useful ?

Another option for never_reached would be to assign the last doy of the period, would that be interesting ? @juliettelavoie
To keep the annotation simple, we could do that if -1 is passed ?

@github-actions github-actions bot added the indicators Climate indices and indicators label Feb 14, 2024
@aulemahal aulemahal changed the title Never reached option - min_members in ens stats - upd chngs Better degree_days_exceedance_date and ensemble stats Feb 14, 2024
@Zeitsperre
Copy link
Collaborator

Another option for never_reached would be to assign the last doy of the period, would that be interesting ? @juliettelavoie To keep the annotation simple, we could do that if -1 is passed ?

Would it be better to mark the value of never_reached as -1? I feel like the alternative would mean that users would need to mask different values depending on their inputs. Having both a NaN value for statistically untrustworthy and a negative value for never reached seems less confusing to me.

@aulemahal
Copy link
Collaborator Author

aulemahal commented Feb 14, 2024

Would it be better to mark the value of never_reached as -1?

Indeed, that could be an option. Then if we want to implement "put the last doy" we need to find how to specify that with never_reached... I was hoping to keep the type of that argument to "int, optional", but the obvious solution to me would be never_reached='last'.

@juliettelavoie
Copy link
Contributor

I am not sure I understand what you mean by "last doy of the period" ? Is it the last day that it was reached on other years ? the day before after_date ?

@aulemahal
Copy link
Collaborator Author

Oh I meant the last day of the period. Like with freq='YS', that would be 365 or 366 according to the calendar. So when the sum is never reached, the indicator would return the largest "doy" possible. I'm not sure if this makes sense at all though.

@Zeitsperre
Copy link
Collaborator

As a GIS person, I'm not a fan of a having a dynamic value acting as a NaN. The fact that (if I understand correctly) depending on the frequency your never_reached value could be 360, 365, 366, 89, 90, 91, 92, 29, 30, 31... Feels nightmarish to me.

@juliettelavoie
Copy link
Contributor

I see. I guess it could make sense in some cases, but may not make a lot a sense depending on the after_date or the context. My case study was freezing road with AS-JUL after oct 1st and most dates being in february/march, including june dates in an ensemble mean would not really make sense.

Also, it is not that hard for the user to put "12-31" (in the case of YS). I guess the advantage of "last" is that is works even if "12-31" doesn't exist like for a 360_day calendar ?

@juliettelavoie
Copy link
Contributor

For context, the goal of having a "never_reached" value was not only to be able to filter it like a different NaN, but to include it in the ensemble mean in a way that made sense for the context (in climatedata.ca, only the ensemble mean is given). In my case study, where I was looking at the beginning of the ice road season, the never_reached was the date of the end of the ice roads season.

@Zeitsperre
Copy link
Collaborator

Zeitsperre commented Feb 14, 2024

For context, the goal of having a "never_reached" value was not only to be able to filter it like a different NaN, but to include it in the ensemble mean in a way that made sense for the context (in climatedata.ca, only the ensemble mean is given). In my case study, where I was looking at the beginning of the ice road season, the never_reached was the date of the end of the ice roads season.

Ah, that helps clarifies things. It also leaves me feeling like this problem is even more complicated. What values could we accept in never_reached to cover our bases?

  • last (end of period)
  • int (doy; for easier filtering)
  • MM-DD (for an arbitrary date; What happens if this date is beyond the date of a given period?)

If we offered these options, would we want to provide a default value? I feel like we wouldn't.

@aulemahal
Copy link
Collaborator Author

ah! Well in that case, indeed making it possible to pass a "DayOfYearStr" makes sense. I'll abandon the idea of "last".

So the options are now:

  • None (default) : assigns NaN (previous behaviour)
  • int : Assigns the int.
  • string "MM-DD" : finds the corresponding doy and assigns that.

(in reality, any non-none and non-string input is directly assigned, so floats are also possible)

@aulemahal
Copy link
Collaborator Author

aulemahal commented Feb 14, 2024

I guess the advantage of "last" is that is works even if "12-31" doesn't exist like for a 360_day calendar ?

Yes, but would also covers cases where freq is not annual.

@aulemahal aulemahal requested a review from tlogan2000 February 14, 2024 18:24
@Zeitsperre
Copy link
Collaborator

Zeitsperre commented Feb 14, 2024

(in reality, any non-none and non-string input is directly assigned, so floats are also possible)

Can we cast/demand an int actually? Wouldn't this promote the data variable to float32? Feels wasteful. At the very least, I'd emit a warning that this is happening (does xarray do this already?).

@Zeitsperre Zeitsperre added the priority Immediate priority label Feb 14, 2024
@Zeitsperre
Copy link
Collaborator

Zeitsperre commented Feb 14, 2024

Side note: We're waiting on comments from the PCIC and Climatedata.ca collaborators. This PR should not be merged before Friday (February 16th) to give them a chance to weigh in.

@aulemahal
Copy link
Collaborator Author

We already return day-of-year as float because of that. We have the check_missing mechanism that will potentially insert NaN afterwards, so we can't really promise an int dtype.

I put int as the annotation because it is a doy. But as always type checking is pretty loose.

@Zeitsperre
Copy link
Collaborator

Zeitsperre commented Feb 14, 2024

We already return day-of-year as float because of that. We have the check_missing mechanism that will potentially insert NaN afterwards, so we can't really promise an int dtype.

I'm guessing that Pandas' nullable int is still far from being supported. Makes sense.

@cpomer10
Copy link

No additional comments from ECCC - thanks!

@github-actions github-actions bot added the approved Approved for additional tests label Feb 16, 2024
@coveralls
Copy link

coveralls commented Feb 16, 2024

Coverage Status

coverage: 90.189% (+0.02%) from 90.171%
when pulling a5e1d7e on dded-opts
into 628cf9a on master.

@Zeitsperre Zeitsperre merged commit d1741bd into master Feb 16, 2024
19 checks passed
@Zeitsperre Zeitsperre deleted the dded-opts branch February 16, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests indicators Climate indices and indicators priority Immediate priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

degree_days_exceedance_date nans cause issue on climatedata.ca
6 participants