Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardized_index_fit_params with method="ML" raises TypeError in scipy #1617

Closed
1 of 2 tasks
saschahofmann opened this issue Jan 24, 2024 · 14 comments · Fixed by #1565
Closed
1 of 2 tasks

standardized_index_fit_params with method="ML" raises TypeError in scipy #1617

saschahofmann opened this issue Jan 24, 2024 · 14 comments · Fixed by #1565
Assignees
Labels
bug Something isn't working dependencies Pull requests that update a dependency file priority Immediate priority
Milestone

Comments

@saschahofmann
Copy link
Contributor

saschahofmann commented Jan 24, 2024

Setup Information

  • Xclim version: 0.47.0
  • Python version: 3.9.15
  • Operating System: Ubuntu 18.04.6 LTS

Description

When I try to compute the params to compute SPI using standardized_index_fit_params from xclim.indices.stats with method="ML", the computation fails with

TypeError: Unknown arguments: {'method': 'mle'}.

in scipy/_distn_infrastructure.py:2395.

Changing the method to APP fixes the failure.

###Note
Make sure to call .compute() if you are working with DaskArrays.

Steps To Reproduce

This is the code I have been running

from xclim.indices.stats import standardized_index_fit_params
params = standardized_index_fit_params(
    precipation,
    freq="MS",
    window=12,
    dist="gamma",
    method="ML",
)

At least from my perspective I don't see what could cause this based on the input
image

Additional context

Full stack trace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[110], line 2
      1 from xclim.indices.stats import standardized_index_fit_params
----> 2 params = standardized_index_fit_params(
      3     precipation,
      4     freq="MS",
      5     window=12,
      6     dist="gamma",
      7     method="ML",
      8 )

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xclim/indices/stats.py:720, in standardized_index_fit_params(da, freq, window, dist, method, offset, **indexer)
    717         da = da + convert_units_to(offset, da, context="hydro")
    719 da, group = preprocess_standardized_index(da, freq, window, **indexer)
--> 720 params = da.groupby(group).map(fit, dist=dist, method=method)
    721 cal_range = (
    722     da.time.min().dt.strftime("%Y-%m-%d").item(),
    723     da.time.max().dt.strftime("%Y-%m-%d").item(),
    724 )
    725 params.attrs = {
    726     "calibration_period": cal_range,
    727     "freq": freq or "",
   (...)
    733     "offset": offset or "",
    734 }

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xarray/core/groupby.py:1399, in DataArrayGroupByBase.map(self, func, args, shortcut, **kwargs)
   1397 grouped = self._iter_grouped_shortcut() if shortcut else self._iter_grouped()
   1398 applied = (maybe_wrap_array(arr, func(arr, *args, **kwargs)) for arr in grouped)
-> 1399 return self._combine(applied, shortcut=shortcut)

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xarray/core/groupby.py:1418, in DataArrayGroupByBase._combine(self, applied, shortcut)
   1416 def _combine(self, applied, shortcut=False):
   1417     """Recombine the applied objects like the original."""
-> 1418     applied_example, applied = peek_at(applied)
   1419     coord, dim, positions = self._infer_concat_args(applied_example)
   1420     if shortcut:

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xarray/core/utils.py:188, in peek_at(iterable)
    184 """Returns the first value from iterable, as well as a new iterator with
    185 the same content as the original iterable
    186 """
    187 gen = iter(iterable)
--> 188 peek = next(gen)
    189 return peek, itertools.chain([peek], gen)

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xarray/core/groupby.py:1398, in <genexpr>(.0)
   1356 """Apply a function to each array in the group and concatenate them
   1357 together into a new array.
   1358 
   (...)
   1395     The result of splitting, applying and combining this array.
   1396 """
   1397 grouped = self._iter_grouped_shortcut() if shortcut else self._iter_grouped()
-> 1398 applied = (maybe_wrap_array(arr, func(arr, *args, **kwargs)) for arr in grouped)
   1399 return self._combine(applied, shortcut=shortcut)

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xclim/indices/stats.py:144, in fit(da, dist, method, dim, **fitkwargs)
    141 shape_params = [] if dc.shapes is None else dc.shapes.split(",")
    142 dist_params = shape_params + ["loc", "scale"]
--> 144 data = xr.apply_ufunc(
    145     _fitfunc_1d,
    146     da,
    147     input_core_dims=[[dim]],
    148     output_core_dims=[["dparams"]],
    149     vectorize=True,
    150     dask="parallelized",
    151     output_dtypes=[float],
    152     keep_attrs=True,
    153     kwargs=dict(
    154         # Don't know how APP should be included, this works for now
    155         dist=dc if method in ["ML", "MLE", "MM", "APP"] else lm3dc,
    156         nparams=len(dist_params),
    157         method=method,
    158         **fitkwargs,
    159     ),
    160     dask_gufunc_kwargs={"output_sizes": {"dparams": len(dist_params)}},
    161 )
    163 # Add coordinates for the distribution parameters and transpose to original shape (with dim -> dparams)
    164 dims = [d if d != dim else "dparams" for d in da.dims]

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xarray/core/computation.py:1262, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, on_missing_core_dim, *args)
   1260 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc
   1261 elif any(isinstance(a, DataArray) for a in args):
-> 1262     return apply_dataarray_vfunc(
   1263         variables_vfunc,
   1264         *args,
   1265         signature=signature,
   1266         join=join,
   1267         exclude_dims=exclude_dims,
   1268         keep_attrs=keep_attrs,
   1269     )
   1270 # feed Variables directly through apply_variable_ufunc
   1271 elif any(isinstance(a, Variable) for a in args):

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xarray/core/computation.py:314, in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args)
    309 result_coords, result_indexes = build_output_coords_and_indexes(
    310     args, signature, exclude_dims, combine_attrs=keep_attrs
    311 )
    313 data_vars = [getattr(a, "variable", a) for a in args]
--> 314 result_var = func(*data_vars)
    316 out: tuple[DataArray, ...] | DataArray
    317 if signature.num_outputs > 1:

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xarray/core/computation.py:821, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
    816     if vectorize:
    817         func = _vectorize(
    818             func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims
    819         )
--> 821 result_data = func(*input_data)
    823 if signature.num_outputs == 1:
    824     result_data = (result_data,)

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/numpy/lib/function_base.py:2328, in vectorize.__call__(self, *args, **kwargs)
   2325     vargs = [args[_i] for _i in inds]
   2326     vargs.extend([kwargs[_n] for _n in names])
-> 2328 return self._vectorize_call(func=func, args=vargs)

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/numpy/lib/function_base.py:2402, in vectorize._vectorize_call(self, func, args)
   2400 """Vectorized call to `func` over positional `args`."""
   2401 if self.signature is not None:
-> 2402     res = self._vectorize_call_with_signature(func, args)
   2403 elif not args:
   2404     res = func()

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/numpy/lib/function_base.py:2442, in vectorize._vectorize_call_with_signature(self, func, args)
   2439 nout = len(output_core_dims)
   2441 for index in np.ndindex(*broadcast_shape):
-> 2442     results = func(*(arg[index] for arg in args))
   2444     n_results = len(results) if isinstance(results, tuple) else 1
   2446     if nout != n_results:

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/xclim/indices/stats.py:64, in _fitfunc_1d(arr, dist, nparams, method, **fitkwargs)
     62 if method in ["ML", "MLE"]:
     63     args, kwargs = _fit_start(x, dist.name, **fitkwargs)
---> 64     params = dist.fit(x, *args, method="mle", **kwargs, **fitkwargs)
     65 elif method == "MM":
     66     params = dist.fit(x, method="mm", **fitkwargs)

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2638, in gamma_gen.fit(self, data, *args, **kwds)
   2634 floc = kwds.get('floc', None)
   2636 if floc is None:
   2637     # loc is not fixed.  Use the default fit method.
-> 2638     return super(gamma_gen, self).fit(data, *args, **kwds)
   2640 # We already have this value, so just pop it from kwds.
   2641 kwds.pop('floc', None)

File ~/Projects/climate-analysis/.direnv/python-3.9.15/lib/python3.9/site-packages/scipy/stats/_distn_infrastructure.py:2395, in rv_continuous.fit(self, data, *args, **kwds)
   2393 # by now kwds must be empty, since everybody took what they needed
   2394 if kwds:
-> 2395     raise TypeError("Unknown arguments: %s." % kwds)
   2397 vals = optimizer(func, x0, args=(ravel(data),), disp=0)
   2398 if restore is not None:

TypeError: Unknown arguments: {'method': 'mle'}.

Contribution

  • I would be willing/able to open a Pull Request to address this bug.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@saschahofmann saschahofmann added the bug Something isn't working label Jan 24, 2024
@saschahofmann
Copy link
Contributor Author

It seems that the only way the function can be run with "ML" is when in _fitfunc_1d. in xclim/indices/stats.py:L58 the if statement is true and it returns Nans.

@coxipi
Copy link
Contributor

coxipi commented Jan 24, 2024

Hi! Thanks for the report.

This is weird, because this case is already tested in our test suite. I used xclim version 0.47 here and it worked fine. Can you confirm that this example is what you were trying to do but with different data, I didn't miss anything (plus the actual spi computation at the end)?

from xclim.testing import open_dataset
from xclim.indices.stats import standardized_index_fit_params

ds = open_dataset("sdba/CanESM2_1950-2100.nc").chunk({"location":1})
precipitation = ds.pr.sel(time=slice("1950", "1980"))
params = standardized_index_fit_params(
      precipitation,
      freq="MS",
      window=12,
      dist="gamma",
      method="ML",
     )
params.compute()

At least from my perspective I don't see what could cause this based on the input

I agree. I can't tell from your screenshot if your data is daily or already monthly, but I tested both cases and they work.

Could you:

  • Try to run my example code above
  • Share your data (just a few problematic points would be great)

@saschahofmann
Copy link
Contributor Author

Thanks @coxipi for looking into this. I just ran your example and it failed with the same error message. I am attaching my pdm.lock file in case you wanna see all of my dependencies, I had to change it to a txt file for Github to allow me to attach it)

A summary:
I am using Python 3.9.15
xclim 0.47.0
dask 2024.1.0
xarray 2023.10.1
numpy 1.23.5

I am going to downgrade dask and xarray and see what happens

pdm.txt

@saschahofmann
Copy link
Contributor Author

Oh and scipy is 1.6.1

@Zeitsperre
Copy link
Collaborator

Zeitsperre commented Jan 24, 2024

@saschahofmann Think you could run the following instead?:

$ xclim show_version_info

@saschahofmann
Copy link
Contributor Author

Of course. Here you go

$ xclim show_version_info
INSTALLED VERSIONS
------------------
python: 3.9.15
boltons: installed
bottleneck: 1.3.7
cf_xarray: 0.8.0
cftime: 1.6.2
click: 8.1.3
clisops: None
dask: 2024.1.0
flox: None
jsonpickle: 3.0.1
lmoments3: 1.0.6
numba: 0.56.4
pandas: 1.5.3
pint: 0.16.1
scipy: 1.6.1
sklearn: 1.2.2
statsmodels: 0.13.5
xarray: 2023.10.1
xclim: 0.47.0
Anaconda-based environment: no

@saschahofmann
Copy link
Contributor Author

I downgraded dask to an early 2023 version but it still failed now playing around with Scipy versions.

@saschahofmann
Copy link
Contributor Author

I tried to understand the traceback but its not clear to me where could be

@coxipi
Copy link
Contributor

coxipi commented Jan 24, 2024

Using scipy==1.6.1 on a fresh install,

from scipy import stats
stats.gamma.fit([1,2,3], method="mle")

I get the error you pointed out:

TypeError: Unknown arguments: {'method': 'mle'}.

So we need to update our dependencies to enforce a minimal working version of scipy with xclim. Sorry about that!

For now, you could update scipy directly? When your environment is already activated:

conda install scipy==1.X.Y 
# or
pip install scipy==1.X.Y

I will check now which version (1.X.Y) is needed.

EDIT: I didn't find info on this particular point in the release notes, but I just checked and scipy==1.7 works with the code above

@coxipi
Copy link
Contributor

coxipi commented Jan 24, 2024

@Zeitsperre I'm not familiar with changing dependencies. scipy==1.6.1 is incompatible with our current _fitfunc_1d because method = 'mle' is not implemented in scipy.stats.gamma.fit. It is in scipy==1.7. Currently we require scipy==1.2. Is there anything else I should check before changing 1.2 -> 1.7 ?

@Zeitsperre
Copy link
Collaborator

This is actually great timing: We'll be raising the version pins quite a bit in #1565! scipy>=1.9 will be required going forward.

@Zeitsperre Zeitsperre added priority Immediate priority dependencies Pull requests that update a dependency file labels Jan 24, 2024
@Zeitsperre Zeitsperre added this to the v0.48.0 milestone Jan 24, 2024
@coxipi
Copy link
Contributor

coxipi commented Jan 24, 2024

@saschahofmann If you could tell us if updating to scipy>=1.9 solves all your issues, it would be great!

If you're using conda and don't want to mess with your environment, you could create a clone and update scipy in the clone instead:

conda create -n new_env --clone your_current_env
conda activate new_env
conda install scipy==1.9

But eventually, to use scipy.stats.gamma.fit with maximum likelihood estimation (mle), you will need the updated scipy in any case.

Cheers!

@saschahofmann
Copy link
Contributor Author

🎉 Just upgraded to 1.12.0 and both my case and the example you've sent me succeeded!

Thanks for being so fast to help me, would love to give you 5 stars 😄 .

Do you want me to close this or shall I leave it open until the version is pinned?

@coxipi
Copy link
Contributor

coxipi commented Jan 25, 2024

Haha, thanks for the good words, it's appreciated. I'm glad you're good to go now!

You can leave the issue open, we'll close it once the pull request solving it has been merged.

Thanks again for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file priority Immediate priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants