Ouranosinc · coxipi · Oct 18, 2023 · Apr 27, 2023 · May 8, 2023 · May 29, 2023
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -4,7 +4,7 @@ Changelog
 
 v0.46.0 (unreleased)
 --------------------
-Contributors to this version: Éric Dupuis (:user:`coxipi`), Trevor James Smith (:user:`Zeitsperre`), David Huard (:user:`huard`) and Pascal Bourgault (:user:`aulemahal`).
+Contributors to this version: David Huard (:user:`huard`), Trevor James Smith (:user:`Zeitsperre`), Pascal Bourgault (:user:`aulemahal`), Éric Dupuis (:user:`coxipi`).
 
 New features and enhancements
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -13,6 +13,7 @@ New features and enhancements
 * The testing suite now offers a means of running tests in "offline" mode (using `pytest-socket <https://github.com/miketheman/pytest-socket>`_ to block external connections). This requires a local copy of `xclim-testdata` to be present in the user's home cache directory and for certain `pytest` options and markers to be set when invoked. For more information, see the contributing documentation section for `Running Tests in Offline Mode`. (:issue:`1468`, :pull:`1473`).
 * The `SKIP_NOTEBOOKS` flag to speed up docs builds is now documented. See the contributing documentation section `Get Started!` for details. (:issue:`1470`, :pull:`1476`).
 * Refactored the indicators page with the addition of a search bar.
+* `adapt_freq_thresh` argument added `sdba` training functions, allowing to perform frequency adaptation appropriately in each map block. (:pull:`1407`).
 
 Bug fixes
 ^^^^^^^^^
@@ -46,6 +47,7 @@ Internal changes
 
 v0.45.0 (2023-09-05)
 --------------------
+=======
 Contributors to this version: David Huard (:user:`huard`), Trevor James Smith (:user:`Zeitsperre`), Pascal Bourgault (:user:`aulemahal`), Juliette Lavoie (:user:`juliettelavoie`), Gabriel Rondeau-Genesse (:user:`RondeauG`), Marco Braun (:user:`vindelico`), Éric Dupuis (:user:`coxipi`).
 
 Announcements

diff --git a/docs/notebooks/sdba-advanced.ipynb b/docs/notebooks/sdba-advanced.ipynb
@@ -692,6 +692,118 @@
     "plt.legend()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Frequency adaption with a rolling window"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the previous example, we performed bias adjustment with a rolling window. Here we show how to include frequency adaptation (see `sdba.ipynb` for the simple case `group=\"time\"`). We first generate the same precipitation dataset used in `sdba.ipynb`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import xarray as xr\n",
+    "\n",
+    "t = xr.cftime_range(\"2000-01-01\", \"2030-12-31\", freq=\"D\", calendar=\"noleap\")\n",
+    "\n",
+    "vals = np.random.randint(0, 1000, size=(t.size,)) / 100\n",
+    "vals_ref = (4 ** np.where(vals < 9, vals / 100, vals)) / 3e6\n",
+    "vals_sim = (\n",
+    "    (1 + 0.1 * np.random.random_sample((t.size,)))\n",
+    "    * (4 ** np.where(vals < 9.5, vals / 100, vals))\n",
+    "    / 3e6\n",
+    ")\n",
+    "\n",
+    "pr_ref = xr.DataArray(\n",
+    "    vals_ref, coords={\"time\": t}, dims=(\"time\",), attrs={\"units\": \"mm/day\"}\n",
+    ")\n",
+    "pr_ref = pr_ref.sel(time=slice(\"2000\", \"2015\"))\n",
+    "pr_sim = xr.DataArray(\n",
+    "    vals_sim, coords={\"time\": t}, dims=(\"time\",), attrs={\"units\": \"mm/day\"}\n",
+    ")\n",
+    "pr_hist = pr_sim.sel(time=slice(\"2000\", \"2015\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Bias adjustment on a rolling window can be performed in the same way as shown in `sdba.ipynb`, but instead of being a single string precising the time grouping (e.g. `time.month`), the `group` argument is built with `sdba.Grouper` function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# adapt_freq with a sdba.Grouper\n",
+    "from xclim import sdba\n",
+    "\n",
+    "group = sdba.Grouper(\"time.dayofyear\", window=31)\n",
+    "sim_ad, pth, dP0 = sdba.processing.adapt_freq(\n",
+    "    pr_ref, pr_sim, thresh=\"0.05 mm d-1\", group=group\n",
+    ")\n",
+    "QM_ad = sdba.EmpiricalQuantileMapping.train(\n",
+    "    pr_ref, sim_ad, nquantiles=15, kind=\"*\", group=group\n",
+    ")\n",
+    "scen_ad = QM_ad.adjust(pr_sim)\n",
+    "\n",
+    "pr_ref.sel(time=\"2010\").plot(alpha=0.9, label=\"Reference\")\n",
+    "pr_sim.sel(time=\"2010\").plot(alpha=0.7, label=\"Model - biased\")\n",
+    "scen_ad.sel(time=\"2010\").plot(alpha=0.6, label=\"Model - adjusted\")\n",
+    "plt.legend()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the figure above, `scen` occasionally has small peaks where `sim` is 0, indicating that there are more \"dry days\" (days with almost no precipitation) in `hist` than in `ref`. The frequency-adaptation [Themeßl et al. (2010)](https://doi.org/10.1007/s10584-011-0224-4) performed in the step above only worked partially.  \n",
+    "\n",
+    "The reason for this is the following. The first step above combines precipitations in 365 overlapping blocks of 31 days * Y years, one block for each day of the year. Each block is adapted, and the 16th day-of-year slice (at the center of the block) is assigned to the corresponding day-of-year in the adapted dataset `sim_ad`. As we proceed to the training, we re-form those 31 days * Y years blocks, but this step does not invert the last one: There can still be more zeroes in the simulation than in the reference. \n",
+    "\n",
+    "To alleviate this issue, another way of proceeding is to perform a frequency adaptation on the blocks, and then use the same blocks in the training step, as we show below.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# adapt_freq directly in the training step\n",
+    "group = sdba.Grouper(\"time.dayofyear\", window=31)\n",
+    "\n",
+    "QM_ad = sdba.EmpiricalQuantileMapping.train(\n",
+    "    pr_ref,\n",
+    "    sim_ad,\n",
+    "    nquantiles=15,\n",
+    "    kind=\"*\",\n",
+    "    group=group,\n",
+    "    adapt_freq_thresh=\"0.05 mm d-1\",\n",
+    ")\n",
+    "scen_ad = QM_ad.adjust(pr_sim)\n",
+    "\n",
+    "pr_ref.sel(time=\"2010\").plot(alpha=0.9, label=\"Reference\")\n",
+    "pr_sim.sel(time=\"2010\").plot(alpha=0.7, label=\"Model - biased\")\n",
+    "scen_ad.sel(time=\"2010\").plot(alpha=0.6, label=\"Model - adjusted\")\n",
+    "plt.legend()"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -835,7 +947,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -849,7 +961,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.9"
+   "version": "3.11.4"
   },
   "toc": {
    "base_numbering": 1,
@@ -863,6 +975,11 @@
    "toc_position": {},
    "toc_section_display": true,
    "toc_window_display": false
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "7bfb2ff12c8d2ac85e75a42433bff767a429e1b9def8e10da354335359f03dc7"
+   }
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/sdba.ipynb b/docs/notebooks/sdba.ipynb
@@ -677,7 +677,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -691,7 +691,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.9"
+   "version": "3.11.4"
   },
   "toc": {
    "base_numbering": 1,
@@ -705,6 +705,11 @@
    "toc_position": {},
    "toc_section_display": true,
    "toc_window_display": false
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "7bfb2ff12c8d2ac85e75a42433bff767a429e1b9def8e10da354335359f03dc7"
+   }
   }
  },
  "nbformat": 4,

diff --git a/xclim/sdba/_adjustment.py b/xclim/sdba/_adjustment.py
@@ -9,38 +9,56 @@
 import numpy as np
 import xarray as xr
 
+from xclim.core.units import convert_units_to, infer_context, units
 from xclim.indices.stats import _fitfunc_1d  # noqa
 
 from . import nbutils as nbu
 from . import utils as u
+from ._processing import _adapt_freq
 from .base import Grouper, map_blocks, map_groups
 from .detrending import PolyDetrend
 from .processing import escore
 
 
+def _adapt_freq_hist(ds, adapt_freq_thresh):
+    """Adapt frequency of null values of `hist`    in order to match `ref`."""
+    with units.context(infer_context(ds.ref.attrs.get("standard_name"))):
+        thresh = convert_units_to(adapt_freq_thresh, ds.ref)
+    dim = ["time"] + ["window"] * ("window" in ds.hist.dims)
+    return _adapt_freq.func(
+        xr.Dataset(dict(sim=ds.hist, ref=ds.ref)), thresh=thresh, dim=dim
+    ).sim_ad
+
+
 @map_groups(
     af=[Grouper.PROP, "quantiles"],
     hist_q=[Grouper.PROP, "quantiles"],
     scaling=[Grouper.PROP],
 )
-def dqm_train(ds, *, dim, kind, quantiles) -> xr.Dataset:
+def dqm_train(ds, *, dim, kind, quantiles, adapt_freq_thresh) -> xr.Dataset:
     """Train step on one group.
 
     Notes
     -----
     Dataset must contain the following variables:
       ref : training target
       hist : training data
+
+    adapt_freq_thresh : str | None
+        Threshold for frequency adaptation. See :py:class:`xclim.sdba.processing.adapt_freq` for details.
+        Default is None, meaning that frequency adaptation is not performed.
     """
+    hist = _adapt_freq_hist(ds, adapt_freq_thresh) if adapt_freq_thresh else ds.hist
+
     refn = u.apply_correction(ds.ref, u.invert(ds.ref.mean(dim), kind), kind)
-    histn = u.apply_correction(ds.hist, u.invert(ds.hist.mean(dim), kind), kind)
+    histn = u.apply_correction(hist, u.invert(hist.mean(dim), kind), kind)
 
     ref_q = nbu.quantile(refn, quantiles, dim)
     hist_q = nbu.quantile(histn, quantiles, dim)
 
     af = u.get_correction(hist_q, ref_q, kind)
     mu_ref = ds.ref.mean(dim)
-    mu_hist = ds.hist.mean(dim)
+    mu_hist = hist.mean(dim)
     scaling = u.get_correction(mu_hist, mu_ref, kind=kind)
 
     return xr.Dataset(data_vars=dict(af=af, hist_q=hist_q, scaling=scaling))
@@ -50,15 +68,20 @@ def dqm_train(ds, *, dim, kind, quantiles) -> xr.Dataset:
     af=[Grouper.PROP, "quantiles"],
     hist_q=[Grouper.PROP, "quantiles"],
 )
-def eqm_train(ds, *, dim, kind, quantiles) -> xr.Dataset:
+def eqm_train(ds, *, dim, kind, quantiles, adapt_freq_thresh) -> xr.Dataset:
     """EQM: Train step on one group.
 
     Dataset variables:
       ref : training target
       hist : training data
+
+    adapt_freq_thresh : str | None
+        Threshold for frequency adaptation. See :py:class:`xclim.sdba.processing.adapt_freq` for details.
+        Default is None, meaning that frequency adaptation is not performed.
     """
+    hist = _adapt_freq_hist(ds, adapt_freq_thresh) if adapt_freq_thresh else ds.hist
     ref_q = nbu.quantile(ds.ref, quantiles, dim)
-    hist_q = nbu.quantile(ds.hist, quantiles, dim)
+    hist_q = nbu.quantile(hist, quantiles, dim)
 
     af = u.get_correction(hist_q, ref_q, kind)
 

diff --git a/xclim/sdba/_processing.py b/xclim/sdba/_processing.py
@@ -11,10 +11,11 @@
 
 import numpy as np
 import xarray as xr
+from xarray.core.utils import get_temp_dimname
 
 from . import nbutils as nbu
 from .base import Grouper, map_groups
-from .utils import ADDITIVE, apply_correction, ecdf, invert
+from .utils import ADDITIVE, apply_correction, ecdf, invert, rank
 
 
 @map_groups(
@@ -74,16 +75,9 @@ def _adapt_freq(
         pth = nbu.vecquantiles(ds.ref, P0_sim, dim).where(dP0 > 0)
 
         # Probabilities and quantiles computed within all dims, but correction along the first one only.
-        if "window" in dim:
-            # P0_sim was computed using the window, but only the original time series is corrected.
-            # Grouper.apply does this step, but if done here it makes the code faster.
-            sim = ds.sim.isel(window=(ds.sim.window.size - 1) // 2)
-        else:
-            sim = ds.sim
-        dim = dim[0]
-
+        sim = ds.sim
         # Get the percentile rank of each value in sim.
-        rank = sim.rank(dim, pct=True)
+        rank(sim, dim=dim, pct=True)
 
         # Frequency-adapted sim
         sim_ad = sim.where(

diff --git a/xclim/sdba/adjustment.py b/xclim/sdba/adjustment.py
@@ -328,6 +328,9 @@ class EmpiricalQuantileMapping(TrainAdjust):
     group : Union[str, Grouper]
         The grouping information. See :py:class:`xclim.sdba.base.Grouper` for details.
         Default is "time", meaning an single adjustment group along dimension "time".
+    adapt_freq_thresh : str | None
+        Threshold for frequency adaptation. See :py:class:`xclim.sdba.processing.adapt_freq` for details.
+        Default is None, meaning that frequency adaptation is not performed.
 
     Adjust step:
 
@@ -352,6 +355,7 @@ def _train(
         nquantiles: int | np.ndarray = 20,
         kind: str = ADDITIVE,
         group: str | Grouper = "time",
+        adapt_freq_thresh: str | None = None,
     ):
         if np.isscalar(nquantiles):
             quantiles = equally_spaced_nodes(nquantiles).astype(ref.dtype)
@@ -363,6 +367,7 @@ def _train(
             group=group,
             kind=kind,
             quantiles=quantiles,
+            adapt_freq_thresh=adapt_freq_thresh,
         )
 
         ds.af.attrs.update(
@@ -417,6 +422,9 @@ class DetrendedQuantileMapping(TrainAdjust):
     group : Union[str, Grouper]
         The grouping information. See :py:class:`xclim.sdba.base.Grouper` for details.
         Default is "time", meaning a single adjustment group along dimension "time".
+    adapt_freq_thresh : str | None
+        Threshold for frequency adaptation. See :py:class:`xclim.sdba.processing.adapt_freq` for details.
+        Default is None, meaning that frequency adaptation is not performed.
 
     Adjust step:
 
@@ -445,6 +453,7 @@ def _train(
         nquantiles: int | np.ndarray = 20,
         kind: str = ADDITIVE,
         group: str | Grouper = "time",
+        adapt_freq_thresh: str | None = None,
     ):
         if group.prop not in ["group", "dayofyear"]:
             warn(
@@ -461,6 +470,7 @@ def _train(
             group=group,
             quantiles=quantiles,
             kind=kind,
+            adapt_freq_thresh=adapt_freq_thresh,
         )
 
         ds.af.attrs.update(