Skip to content

Commit

Permalink
Remove lmoments3 (#1644)
Browse files Browse the repository at this point in the history
### What kind of change does this PR introduce?

* Remove `lmoments3` from xclim's dependencies.
* Accept `rv_continuous` instances for the `dist` arguments of the
statistical indices

### Does this PR introduce a breaking change?

Yes. Passing a mere string and `method='PWM'` is now broken. It will
raise a ValueError with a message asking to pass an instance of
`lmoments3` instead.

This also makes some functions awkward to use. Before, we relied on the
`scipy_dist` attribute of `params` to retrieve the distribution when
computing statistics. As one can now pass an object that has nothing to
do with scipy, those functions (`parametric_quantile` for example) must
now also accept the dist as an argument.

### Other information:

See Ouranosinc/lmoments3#12.

I kinda cheated and made it so the `dist` argument would show up as a
"String" parameter to the indicator. This only changes the metadata,
distributions objects can still be passed to indicators.
  • Loading branch information
Zeitsperre authored Feb 16, 2024
2 parents f42c986 + ade1493 commit 1385fca
Show file tree
Hide file tree
Showing 9 changed files with 142 additions and 108 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ jobs:
python-version: "3.9"
markers: -m 'not slow'
os: ubuntu-latest
- tox-env: py310-coverage # No markers -- includes slow tests
- tox-env: py310-coverage-lmoments # No markers -- includes slow tests
python-version: "3.10"
os: ubuntu-latest
- tox-env: py311-coverage-sbck
Expand Down
5 changes: 4 additions & 1 deletion CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,10 @@ New features and enhancements
* New ``xclim.core.calendar.stack_periods`` and ``unstack_periods`` for performing ``rolling(time=...).construct(..., stride=...)`` but with non-uniform temporal periods like years or months. They replace ``xclim.sdba.processing.construct_moving_yearly_window`` and ``unpack_moving_yearly_window`` which are deprecated and will be removed in a future release.
* New ``as_dataset`` options for ``xclim.set_options``. When True, indicators will output Datasets instead of DataArrays. (:issue:`1257`, :pull:`1625`).
* Added new option for UTCI calculation to cap low wind velocities to a minimum of 0.5 m/s following Bröde (2012) guidelines. (:issue:`1634`, :pull:`1635`).

* Added option ``never_reached`` to ``degree_days_exceedance_date`` to assign a custom value when the sum threshold is never reached. (:issue:`1459`, :pull:`1647`).
* Added option ``min_members`` to ensemble statistics to mask elements when the number of valid members is under a threshold. (:issue:`1459`, :pull:`1647`).
* Distribution instances can now be passed to the ``dist`` argument of most statistical indices. (:pull:`1644`).

Breaking changes
^^^^^^^^^^^^^^^^
Expand All @@ -42,14 +44,15 @@ Breaking changes
* The indice and indicator for ``winter_storm`` has been removed (deprecated since `xclim` v0.46.0 in favour of ``snd_storm_days``). (:pull:`1565`).
* `xclim` has dropped support for `scipy` version below v1.9.0 and `numpy` versions below v1.20.0. (:pull:`1565`).
* For generic function ``select_resample_op`` and ``core.units.to_agg_units``, operation "sum" will now return the same units as the input, and not implicitly be translated to an "integral". (:issue:`1645`, :pull:`1649`).
* `lmoments3` was removed as a dependency of `xclim` due to incompatible licensing (GPLv3 vs `xclim`'s Apache 2.0). Depending on the outcome of efforts to modify the licensing of `lmoments3`, this change may eventually be reverted. See `Ouranosinc/lmoments3#12 <https://github.com/Ouranosinc/lmoments3/issues/12>`_. See also the "frequency analysis" notebook for an example on how to continue using the probability weighted moments method for fitting distributions. (:issue:`1620`, :pull:`1644`).

Bug fixes
^^^^^^^^^
* Fixed passing ``missing=0`` to ``xclim.core.calendar.convert_calendar``. (:issue:`1562`, :pull:`1563`).
* Fix wrong `window` attributes in ``xclim.indices.standardized_precipitation_index``, ``xclim.indices.standardized_precipitation_evapotranspiration_index``. (:issue:`1552` :pull:`1554`).
* Fix the daily case `freq='D'` of ``xclim.stats.preprocess_standardized_index`` (:issue:`1602` :pull:`1607`).
* Several spelling mistakes have been corrected within the documentation and codebase. (:pull:`1576`).
* Added missing ``xclim.ensembles.robustness_fractions`` and ``xclim.ensembles.robistness_categoris`` in api doc section. (:pull:`1630`).
* Added missing ``xclim.ensembles.robustness_fractions`` and ``xclim.ensembles.robustness_categories`` in api doc section. (:pull:`1630`).
* Fixed an issue that can occur when fetching the testing data and running tests on Windows systems. Adapted a few existing tests for Windows support. (:pull:`1648`).

Internal changes
Expand Down
12 changes: 8 additions & 4 deletions docs/notebooks/frequency_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The next step is to fit the statistical distribution on these maxima. This is done by the `.fit` method, which takes as argument the sample series, the distribution's name and the parameter estimation `method`. The fit is done by default using the Maximum Likelihood algorithm (`method=\"ML\"`). For some extreme value distributions, however, the maximum likelihood is not always robust, and `xclim` offers the possibility to use Probability Weighted Moments (`method=\"PWM\"`) to estimate parameters. Note that the `lmoments3` package which is used by `xclim` to compute the PWM only supports `expon`, `gamma`, `genextreme`, `genpareto`, `gumbel_r`, `pearson3` and `weibull_min`. Parameters can also be estimated using the method of moments (`method=\"MM\"`)."
"The next step is to fit the statistical distribution on these maxima. This is done by the `.fit` method, which takes as argument the sample series, the distribution's name and the parameter estimation `method`. The fit is done by default using the Maximum Likelihood algorithm (`method=\"ML\"`). Parameters can also be estimated using the method of moments (`method=\"MM\"`).\n",
"\n",
"`xclim` can also accept a distribution instance instead of name (i.e. a subclass of `scipy.stats.rv_continuous`). For example, for some extreme value distributions, the maximum likelihood is not always robust. Using the \"Probability Weighted Moments\" (`method=\"PWM\"`) method can help in that case. This is possible by passing a distribution object from the `lmoments3` package together with `method=\"PWM\"`. That package currently only supports `expon`, `gamma`, `genextreme`, `genpareto`, `gumbel_r`, `pearson3`, and `weibull_min` (with other names, see [the documentation](https://lmoments3.readthedocs.io/en/stable/distributions.html)). In the following example, we fit using the \"Generalized extreme value\" distribution from `lmoments3`."
]
},
{
Expand All @@ -112,8 +114,10 @@
"metadata": {},
"outputs": [],
"source": [
"from lmoments3.distr import gev\n",
"\n",
"# The fitting dimension is hard-coded as `time`.\n",
"params = fit(sub, dist=\"genextreme\")\n",
"params = fit(sub, dist=gev, method=\"PWM\")\n",
"params"
]
},
Expand Down Expand Up @@ -186,9 +190,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}
1 change: 0 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ dependencies:
- Click >=8.1
- dask >=2.6.0
- jsonpickle
- lmoments3
- numba
- numpy >=1.20.0
- pandas >=2.2.0
Expand Down
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ dependencies = [
"Click>=8.1",
"dask[array]>=2.6",
"jsonpickle",
"lmoments3>=1.0.5",
"numba",
"numpy>=1.20.0",
"pandas>=2.2",
Expand Down
50 changes: 40 additions & 10 deletions tests/test_stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,10 +166,11 @@ def test_fa(fitda):
np.testing.assert_array_equal(q[0, 0, 0], q0)


def test_fa_gamma(fitda):
def test_fa_gamma_lmom(fitda):
lmom = pytest.importorskip("lmoments3.distr")
T = 10
q = stats.fa(fitda, T, "lognorm", method="MM")
q1 = stats.fa(fitda, T, "gamma", method="PWM")
q1 = stats.fa(fitda, T, lmom.gam, method="PWM")
np.testing.assert_allclose(q1, q, rtol=0.2)


Expand All @@ -192,6 +193,21 @@ def test_dims_order(fitda):
assert p.dims[-1] == "dparams"


lm3_dist_map = {
"expon": "exp",
"gamma": "gam",
"genextreme": "gev",
# "genlogistic": "glo",
# "gennorm": "gno",
"genpareto": "gpa",
"gumbel_r": "gum",
# "kappa4": "kap",
"norm": "nor",
"pearson3": "pe3",
"weibull_min": "wei",
}


class TestPWMFit:
params = {
"expon": {"loc": 0.9527273, "scale": 2.2836364},
Expand All @@ -213,20 +229,23 @@ class TestPWMFit:
}
inputs_pdf = [4, 5, 6, 7]

@pytest.mark.parametrize("dist", stats._lm3_dist_map.keys())
@pytest.mark.parametrize("dist", lm3_dist_map.keys())
def test_get_lm3_dist(self, dist):
"""Check that parameterization for lmoments3 and scipy is identical."""
lmom = pytest.importorskip("lmoments3.distr")
lm3dc = getattr(lmom, lm3_dist_map[dist])
dc = stats.get_dist(dist)
lm3dc = stats.get_lm3_dist(dist)
par = self.params[dist]
expected = dc(**par).pdf(self.inputs_pdf)
values = lm3dc(**par).pdf(self.inputs_pdf)
np.testing.assert_array_almost_equal(values, expected)

@pytest.mark.parametrize("dist", stats._lm3_dist_map.keys())
@pytest.mark.parametrize("dist", lm3_dist_map.keys())
@pytest.mark.parametrize("use_dask", [True, False])
def test_pwm_fit(self, dist, use_dask, random):
"""Test that the fitted parameters match parameters used to generate a random sample."""
lmom = pytest.importorskip("lmoments3.distr")
lm3dc = getattr(lmom, lm3_dist_map[dist])
n = 500
dc = stats.get_dist(dist)
par = self.params[dist]
Expand All @@ -237,11 +256,10 @@ def test_pwm_fit(self, dist, use_dask, random):
)
if use_dask:
da = da.chunk()
out = stats.fit(da, dist=dist, method="PWM").compute()
out = stats.fit(da, dist=lm3dc, method="PWM").compute()

# Check that values are identical to lmoments3's output dict
l3dc = stats.get_lm3_dist(dist)
expected = l3dc.lmom_fit(da.values)
expected = lm3dc.lmom_fit(da.values)
for key, val in expected.items():
np.testing.assert_array_equal(out.sel(dparams=key), val, 1)

Expand Down Expand Up @@ -270,9 +288,21 @@ def test_frequency_analysis(ndq_series, use_dask):
q.transpose(), mode="max", t=2, dist="genextreme", window=6, freq="YS"
)

# Test with PWM fitting method

@pytest.mark.parametrize("use_dask", [True, False])
@pytest.mark.filterwarnings("ignore::RuntimeWarning")
def test_frequency_analysis_lmoments(ndq_series, use_dask):
lmom = pytest.importorskip("lmoments3.distr")
q = ndq_series.copy()
q[:, 0, 0] = np.nan
if use_dask:
q = q.chunk()

out = stats.frequency_analysis(
q, mode="max", t=2, dist="genextreme", window=6, freq="YS"
)
out1 = stats.frequency_analysis(
q, mode="max", t=2, dist="genextreme", window=6, freq="YS", method="PWM"
q, mode="max", t=2, dist=lmom.gev, window=6, freq="YS", method="PWM"
)
np.testing.assert_allclose(
out1,
Expand Down
4 changes: 3 additions & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ env_list =
offline-prefetch
py39-upstream-doctest
py310
py311
py311-lmoments
py312-numba
labels =
test = py39, py310-upstream-doctest, py311, notebooks_doctests, offline-prefetch
Expand Down Expand Up @@ -105,6 +105,8 @@ deps =
coverage: coveralls
upstream: -rrequirements_upstream.txt
sbck: pybind11
lmoments: lmoments3
notebooks_doctests: lmoments3
install_command = python -m pip install --no-user {opts} {packages}
download = True
commands_pre =
Expand Down
8 changes: 4 additions & 4 deletions xclim/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -680,6 +680,9 @@ def infer_kind_from_parameter(param) -> InputKind:
if param.name == "freq":
return InputKind.FREQ_STR

if param.kind == param.VAR_KEYWORD:
return InputKind.KWARGS

if annot == {"Quantified"}:
return InputKind.QUANTIFIED

Expand All @@ -692,7 +695,7 @@ def infer_kind_from_parameter(param) -> InputKind:
if annot.issubset({"int", "float", "Sequence[int]", "Sequence[float]"}):
return InputKind.NUMBER_SEQUENCE

if annot == {"str"}:
if annot.issuperset({"str"}):
return InputKind.STRING

if annot == {"DateStr"}:
Expand All @@ -704,9 +707,6 @@ def infer_kind_from_parameter(param) -> InputKind:
if annot == {"Dataset"}:
return InputKind.DATASET

if param.kind == param.VAR_KEYWORD:
return InputKind.KWARGS

return InputKind.OTHER_PARAMETER


Expand Down
Loading

0 comments on commit 1385fca

Please sign in to comment.