Remove lmoments3 (#1644)

### What kind of change does this PR introduce? * Remove `lmoments3` from xclim's dependencies. * Accept `rv_continuous` instances for the `dist` arguments of the statistical indices ### Does this PR introduce a breaking change? Yes. Passing a mere string and `method='PWM'` is now broken. It will raise a ValueError with a message asking to pass an instance of `lmoments3` instead. This also makes some functions awkward to use. Before, we relied on the `scipy_dist` attribute of `params` to retrieve the distribution when computing statistics. As one can now pass an object that has nothing to do with scipy, those functions (`parametric_quantile` for example) must now also accept the dist as an argument. ### Other information: See Ouranosinc/lmoments3#12. I kinda cheated and made it so the `dist` argument would show up as a "String" parameter to the indicator. This only changes the metadata, distributions objects can still be passed to indicators.
Ouranosinc · Feb 16, 2024 · 1385fca · 1385fca
2 parents f42c986 + ade1493
commit 1385fca
Show file tree

Hide file tree

Showing 9 changed files with 142 additions and 108 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -132,7 +132,7 @@ jobs:
             python-version: "3.9"
             markers: -m 'not slow'
             os: ubuntu-latest
-          - tox-env: py310-coverage  # No markers -- includes slow tests
+          - tox-env: py310-coverage-lmoments  # No markers -- includes slow tests
             python-version: "3.10"
             os: ubuntu-latest
           - tox-env: py311-coverage-sbck

diff --git a/CHANGES.rst b/CHANGES.rst
@@ -22,8 +22,10 @@ New features and enhancements
 * New ``xclim.core.calendar.stack_periods`` and ``unstack_periods`` for performing ``rolling(time=...).construct(..., stride=...)`` but with non-uniform temporal periods like years or months. They replace ``xclim.sdba.processing.construct_moving_yearly_window`` and ``unpack_moving_yearly_window`` which are deprecated and will be removed in a future release.
 * New ``as_dataset`` options for ``xclim.set_options``. When True, indicators will output Datasets instead of DataArrays. (:issue:`1257`, :pull:`1625`).
 * Added new option for UTCI calculation to cap low wind velocities to a minimum of 0.5 m/s following Bröde (2012) guidelines. (:issue:`1634`, :pull:`1635`).
+
 * Added option ``never_reached`` to ``degree_days_exceedance_date`` to assign a custom value when the sum threshold is never reached. (:issue:`1459`, :pull:`1647`).
 * Added option ``min_members`` to ensemble statistics to mask elements when the number of valid members is under a threshold. (:issue:`1459`, :pull:`1647`).
+* Distribution instances can now be passed to the ``dist`` argument of most statistical indices. (:pull:`1644`).
 
 Breaking changes
 ^^^^^^^^^^^^^^^^
@@ -42,14 +44,15 @@ Breaking changes
 * The indice and indicator for ``winter_storm`` has been removed (deprecated since `xclim` v0.46.0 in favour of ``snd_storm_days``). (:pull:`1565`).
 * `xclim` has dropped support for `scipy` version below v1.9.0 and `numpy` versions below v1.20.0. (:pull:`1565`).
 * For generic function ``select_resample_op`` and ``core.units.to_agg_units``, operation "sum" will now return the same units as the input, and not implicitly be translated to an "integral". (:issue:`1645`, :pull:`1649`).
+* `lmoments3` was removed as a dependency of `xclim` due to incompatible licensing (GPLv3 vs `xclim`'s Apache 2.0). Depending on the outcome of efforts to modify the licensing of `lmoments3`, this change may eventually be reverted. See `Ouranosinc/lmoments3#12 <https://github.com/Ouranosinc/lmoments3/issues/12>`_. See also the "frequency analysis" notebook for an example on how to continue using the probability weighted moments method for fitting distributions. (:issue:`1620`, :pull:`1644`).
 
 Bug fixes
 ^^^^^^^^^
 * Fixed passing ``missing=0`` to ``xclim.core.calendar.convert_calendar``. (:issue:`1562`, :pull:`1563`).
 * Fix wrong `window` attributes in ``xclim.indices.standardized_precipitation_index``, ``xclim.indices.standardized_precipitation_evapotranspiration_index``. (:issue:`1552`  :pull:`1554`).
 * Fix the daily case `freq='D'` of ``xclim.stats.preprocess_standardized_index`` (:issue:`1602`  :pull:`1607`).
 * Several spelling mistakes have been corrected within the documentation and codebase. (:pull:`1576`).
-* Added missing ``xclim.ensembles.robustness_fractions`` and ``xclim.ensembles.robistness_categoris`` in api doc section. (:pull:`1630`).
+* Added missing ``xclim.ensembles.robustness_fractions`` and ``xclim.ensembles.robustness_categories`` in api doc section. (:pull:`1630`).
 * Fixed an issue that can occur when fetching the testing data and running tests on Windows systems. Adapted a few existing tests for Windows support. (:pull:`1648`).
 
 Internal changes

diff --git a/docs/notebooks/frequency_analysis.ipynb b/docs/notebooks/frequency_analysis.ipynb
@@ -103,7 +103,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The next step is to fit the statistical distribution on these maxima. This is done by the `.fit` method, which takes as argument the sample series, the distribution's name and the parameter estimation `method`. The fit is done by default using the Maximum Likelihood algorithm (`method=\"ML\"`). For some extreme value distributions, however, the maximum likelihood is not always robust, and `xclim` offers the possibility to use Probability Weighted Moments (`method=\"PWM\"`) to estimate parameters. Note that the `lmoments3` package which is used by `xclim` to compute the PWM only supports `expon`, `gamma`, `genextreme`, `genpareto`, `gumbel_r`, `pearson3` and `weibull_min`. Parameters can also be estimated using the method of moments (`method=\"MM\"`)."
+    "The next step is to fit the statistical distribution on these maxima. This is done by the `.fit` method, which takes as argument the sample series, the distribution's name and the parameter estimation `method`. The fit is done by default using the Maximum Likelihood algorithm (`method=\"ML\"`). Parameters can also be estimated using the method of moments (`method=\"MM\"`).\n",
+    "\n",
+    "`xclim` can also accept a distribution instance instead of name (i.e. a subclass of `scipy.stats.rv_continuous`). For example, for some extreme value distributions, the maximum likelihood is not always robust. Using the \"Probability Weighted Moments\" (`method=\"PWM\"`) method can help in that case. This is possible by passing a distribution object from the `lmoments3` package together with `method=\"PWM\"`. That package currently only supports `expon`, `gamma`, `genextreme`, `genpareto`, `gumbel_r`, `pearson3`, and `weibull_min` (with other names, see [the documentation](https://lmoments3.readthedocs.io/en/stable/distributions.html)). In the following example, we fit using the \"Generalized extreme value\" distribution from `lmoments3`."
    ]
   },
   {
@@ -112,8 +114,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "from lmoments3.distr import gev\n",
+    "\n",
     "# The fitting dimension is hard-coded as `time`.\n",
-    "params = fit(sub, dist=\"genextreme\")\n",
+    "params = fit(sub, dist=gev, method=\"PWM\")\n",
     "params"
    ]
   },
@@ -186,9 +190,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.9"
+   "version": "3.11.7"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 1
+ "nbformat_minor": 4
 }
diff --git a/environment.yml b/environment.yml
@@ -13,7 +13,6 @@ dependencies:
   - Click >=8.1
   - dask >=2.6.0
   - jsonpickle
-  - lmoments3
   - numba
   - numpy >=1.20.0
   - pandas >=2.2.0

diff --git a/pyproject.toml b/pyproject.toml
@@ -39,7 +39,6 @@ dependencies = [
   "Click>=8.1",
   "dask[array]>=2.6",
   "jsonpickle",
-  "lmoments3>=1.0.5",
   "numba",
   "numpy>=1.20.0",
   "pandas>=2.2",

diff --git a/tests/test_stats.py b/tests/test_stats.py
@@ -166,10 +166,11 @@ def test_fa(fitda):
     np.testing.assert_array_equal(q[0, 0, 0], q0)
 
 
-def test_fa_gamma(fitda):
+def test_fa_gamma_lmom(fitda):
+    lmom = pytest.importorskip("lmoments3.distr")
     T = 10
     q = stats.fa(fitda, T, "lognorm", method="MM")
-    q1 = stats.fa(fitda, T, "gamma", method="PWM")
+    q1 = stats.fa(fitda, T, lmom.gam, method="PWM")
     np.testing.assert_allclose(q1, q, rtol=0.2)
 
 
@@ -192,6 +193,21 @@ def test_dims_order(fitda):
     assert p.dims[-1] == "dparams"
 
 
+lm3_dist_map = {
+    "expon": "exp",
+    "gamma": "gam",
+    "genextreme": "gev",
+    # "genlogistic": "glo",
+    # "gennorm": "gno",
+    "genpareto": "gpa",
+    "gumbel_r": "gum",
+    # "kappa4": "kap",
+    "norm": "nor",
+    "pearson3": "pe3",
+    "weibull_min": "wei",
+}
+
+
 class TestPWMFit:
     params = {
         "expon": {"loc": 0.9527273, "scale": 2.2836364},
@@ -213,20 +229,23 @@ class TestPWMFit:
     }
     inputs_pdf = [4, 5, 6, 7]
 
-    @pytest.mark.parametrize("dist", stats._lm3_dist_map.keys())
+    @pytest.mark.parametrize("dist", lm3_dist_map.keys())
     def test_get_lm3_dist(self, dist):
         """Check that parameterization for lmoments3 and scipy is identical."""
+        lmom = pytest.importorskip("lmoments3.distr")
+        lm3dc = getattr(lmom, lm3_dist_map[dist])
         dc = stats.get_dist(dist)
-        lm3dc = stats.get_lm3_dist(dist)
         par = self.params[dist]
         expected = dc(**par).pdf(self.inputs_pdf)
         values = lm3dc(**par).pdf(self.inputs_pdf)
         np.testing.assert_array_almost_equal(values, expected)
 
-    @pytest.mark.parametrize("dist", stats._lm3_dist_map.keys())
+    @pytest.mark.parametrize("dist", lm3_dist_map.keys())
     @pytest.mark.parametrize("use_dask", [True, False])
     def test_pwm_fit(self, dist, use_dask, random):
         """Test that the fitted parameters match parameters used to generate a random sample."""
+        lmom = pytest.importorskip("lmoments3.distr")
+        lm3dc = getattr(lmom, lm3_dist_map[dist])
         n = 500
         dc = stats.get_dist(dist)
         par = self.params[dist]
@@ -237,11 +256,10 @@ def test_pwm_fit(self, dist, use_dask, random):
         )
         if use_dask:
             da = da.chunk()
-        out = stats.fit(da, dist=dist, method="PWM").compute()
+        out = stats.fit(da, dist=lm3dc, method="PWM").compute()
 
         # Check that values are identical to lmoments3's output dict
-        l3dc = stats.get_lm3_dist(dist)
-        expected = l3dc.lmom_fit(da.values)
+        expected = lm3dc.lmom_fit(da.values)
         for key, val in expected.items():
             np.testing.assert_array_equal(out.sel(dparams=key), val, 1)
 
@@ -270,9 +288,21 @@ def test_frequency_analysis(ndq_series, use_dask):
         q.transpose(), mode="max", t=2, dist="genextreme", window=6, freq="YS"
     )
 
-    # Test with PWM fitting method
+
+@pytest.mark.parametrize("use_dask", [True, False])
+@pytest.mark.filterwarnings("ignore::RuntimeWarning")
+def test_frequency_analysis_lmoments(ndq_series, use_dask):
+    lmom = pytest.importorskip("lmoments3.distr")
+    q = ndq_series.copy()
+    q[:, 0, 0] = np.nan
+    if use_dask:
+        q = q.chunk()
+
+    out = stats.frequency_analysis(
+        q, mode="max", t=2, dist="genextreme", window=6, freq="YS"
+    )
     out1 = stats.frequency_analysis(
-        q, mode="max", t=2, dist="genextreme", window=6, freq="YS", method="PWM"
+        q, mode="max", t=2, dist=lmom.gev, window=6, freq="YS", method="PWM"
     )
     np.testing.assert_allclose(
         out1,

diff --git a/tox.ini b/tox.ini
@@ -7,7 +7,7 @@ env_list =
     offline-prefetch
     py39-upstream-doctest
     py310
-    py311
+    py311-lmoments
     py312-numba
 labels =
     test = py39, py310-upstream-doctest, py311, notebooks_doctests, offline-prefetch
@@ -105,6 +105,8 @@ deps =
     coverage: coveralls
     upstream: -rrequirements_upstream.txt
     sbck: pybind11
+    lmoments: lmoments3
+    notebooks_doctests: lmoments3
 install_command = python -m pip install --no-user {opts} {packages}
 download = True
 commands_pre =

diff --git a/xclim/core/utils.py b/xclim/core/utils.py
@@ -680,6 +680,9 @@ def infer_kind_from_parameter(param) -> InputKind:
     if param.name == "freq":
         return InputKind.FREQ_STR
 
+    if param.kind == param.VAR_KEYWORD:
+        return InputKind.KWARGS
+
     if annot == {"Quantified"}:
         return InputKind.QUANTIFIED
 
@@ -692,7 +695,7 @@ def infer_kind_from_parameter(param) -> InputKind:
     if annot.issubset({"int", "float", "Sequence[int]", "Sequence[float]"}):
         return InputKind.NUMBER_SEQUENCE
 
-    if annot == {"str"}:
+    if annot.issuperset({"str"}):
         return InputKind.STRING
 
     if annot == {"DateStr"}:
@@ -704,9 +707,6 @@ def infer_kind_from_parameter(param) -> InputKind:
     if annot == {"Dataset"}:
         return InputKind.DATASET
 
-    if param.kind == param.VAR_KEYWORD:
-        return InputKind.KWARGS
-
     return InputKind.OTHER_PARAMETER