Spellchecking with codespell (#1576)

### What kind of change does this PR introduce? * Adds `codespell` for checking word spellings * Configures `codespell` to ignore several French words and all current translations configurations. * Adds a `pre-commit` hook for performing these checks on commit ### Does this PR introduce a breaking change? Yes, `codespell` is now a development dependency. ### Other information: https://github.com/codespell-project/codespell
Ouranosinc · Jan 9, 2024 · e335ff4 · e335ff4
2 parents 58dc43b + 9888d95
commit e335ff4
Show file tree

Hide file tree

Showing 31 changed files with 97 additions and 79 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -40,7 +40,7 @@ repos:
   hooks:
   - id: isort
 - repo: https://github.com/astral-sh/ruff-pre-commit
-  rev: v0.1.9
+  rev: v0.1.11
   hooks:
   - id: ruff
 - repo: https://github.com/pycqa/flake8
@@ -53,22 +53,30 @@ repos:
   rev: 1.7.1
   hooks:
   - id: nbqa-pyupgrade
+    additional_dependencies: [ 'pyupgrade==3.15.0' ]
     args: [ '--py38-plus' ]
   - id: nbqa-black
     additional_dependencies: [ 'black==23.12.1' ]
   - id: nbqa-isort
+    additional_dependencies: [ 'isort==5.13.2' ]
 - repo: https://github.com/kynan/nbstripout
   rev: 0.6.1
   hooks:
   - id: nbstripout
     files: '.ipynb'
-    args: [ '--extra-keys', 'metadata.kernelspec' ]
+    args: [ '--extra-keys=metadata.kernelspec' ]
 - repo: https://github.com/keewis/blackdoc
   rev: v0.3.9
   hooks:
   - id: blackdoc
     additional_dependencies: [ 'black==23.12.1' ]
     exclude: '(xclim/indices/__init__.py|docs/installation.rst)'
+- repo: https://github.com/codespell-project/codespell
+  rev: v2.2.6
+  hooks:
+  - id: codespell
+    additional_dependencies: [ 'tomli' ]
+    args: [ '--toml=pyproject.toml' ]
 - repo: https://github.com/python-jsonschema/check-jsonschema
   rev: 0.27.3
   hooks:

diff --git a/CHANGES.rst b/CHANGES.rst
@@ -25,13 +25,15 @@ Bug fixes
 ^^^^^^^^^
 * Fixed passing ``missing=0`` to ``xclim.core.calendar.convert_calendar``. (:issue:`1562`, :pull:`1563`).
 * Fix wrong `window` attributes in ``xclim.indices.standardized_precipitation_index``, ``xclim.indices.standardized_precipitation_evapotranspiration_index``. (:issue:`1552`  :pull:`1554`).
+* Several spelling mistakes have been corrected within the documentation and codebase. (:pull:`1576`).
 
 Internal changes
 ^^^^^^^^^^^^^^^^
 * The `flake8` configuration has been migrated from `setup.cfg` to `.flake8`; `setup.cfg` has been removed. (:pull:`1569`)
 * The `bump-version.yml` workflow has been adjusted to bump the `patch` version when the last version is determined to have been a `release` version; otherwise, the `build` version is bumped. (:issue:`1557`, :pull:`1569`).
 * The GitHub Workflows now use the `step-security/harden-runner` action to monitor source code, actions, and dependency safety. All workflows now employ more constrained permissions rule sets to prevent security issues. (:pull:`1577`).
 * Updated the CONTRIBUTING.rst directions to showcase the new versioning system. (:issue:`1557`, :pull:`1573`).
+* The `codespell` library is now a development dependency for the `dev` installation recipe with configurations found within `pyproject.toml`. This is also now a linting step and integrated as a `pre-commit` hook. For more information, see the `codespell documentation <https://github.com/codespell-project/codespell>`_ (:pull:`1576`).
 
 
 v0.47.0 (2023-12-01)
@@ -317,7 +319,7 @@ New features and enhancements
     * ``xclim.core.calendar.yearly_interpolated_doy``
     * ``xclim.core.calendar.yearly_random_doy``
 * `scipy` is no longer pinned below v1.9 and `lmoments3>=1.0.5` is now a core dependency and installed by default with `pip`. (:issue:`1142`, :pull:`1171`).
-* Fix bug on number of bins in ``xclim.sdba.propeties.spatial_correlogram``. (:pull:`1336`)
+* Fix bug on number of bins in ``xclim.sdba.properties.spatial_correlogram``. (:pull:`1336`)
 * Add `resample_before_rl` argument to control when resampling happens in `maximum_consecutive_{frost|frost_free|dry|tx}_days` and in heat indices (in `_threshold`)  (:issue:`1329`, :pull:`1331`)
 * Add ``xclim.ensembles.make_criteria`` to help create inputs for the ensemble-reduction methods. (:issue:`1338`, :pull:`1341`).
 
@@ -1071,7 +1073,7 @@ Bug fixes
 * Dimensions in a grouper's ``add_dims`` are now taken into consideration in function wrapped with ``map_blocks/groups``. This feature is still not fully tested throughout ``sdba`` though, so use with caution.
 * Better dtype preservation throughout ``sdba``.
 * "constant" extrapolation in the quantile mappings' adjustment is now padding values just above and under the target's max and min, instead of ``±np.inf``.
-* Fixes in ``sdba.LOCI`` for the case where a grouping with additionnal dimensions is used.
+* Fixes in ``sdba.LOCI`` for the case where a grouping with additional dimensions is used.
 
 Internal Changes
 ^^^^^^^^^^^^^^^^
@@ -1139,7 +1141,7 @@ New indicators
 Internal Changes
 ^^^^^^^^^^^^^^^^
 * ``aggregate_between_dates`` (introduced in v0.27.0) now accepts ``DayOfYear``-like strings for supplying start and end dates (e.g. ``start="02-01", end="10-31"``).
-* The indicator call sequence now considers "variable" the inputs annoted so. Dropped the ``nvar`` attribute.
+* The indicator call sequence now considers "variable" the inputs annotated so. Dropped the ``nvar`` attribute.
 * Default cfcheck is now to check metadata according to the variable name, using CMIP6 names in xclim/data/variable.yml.
 * ``Indicator.missing`` defaults to "skip" if ``freq`` is absent from the list of parameters.
 * Minor modifications to the GitHub Pull Requests template.
@@ -1186,7 +1188,7 @@ New indicators
 Internal Changes
 ^^^^^^^^^^^^^^^^
 * `run_length.rle_statistics` now accepts a `window` argument.
-* Common arguments to the `op` parameter now have better adjective and noun formattings.
+* Common arguments to the `op` parameter now have better adjective and noun formatting.
 * Added and adjusted typing in call signatures and docstrings, with grammar fixes, for many `xclim.indices` operations.
 * Added internal function ``aggregate_between_dates`` for array aggregation operations using xarray datetime arrays with start and end DayOfYear values.
 
@@ -1422,7 +1424,7 @@ Breaking changes
 * The python library `pandoc` is no longer listed as a docs build requirement. Documentation still requires a current
   version of `pandoc` binaries installed at system-level.
 * ANUCLIM indices have seen their `input_freq` parameter renamed to `src_timestep` for clarity.
-* A clean-up and harmonization of the indicators metadata has changed some of the indicator identifiers, long_names, abstracts and titles. `xclim.atmos.drought_code` and `fire_weather_indexes` now have indentifiers "dc" and "fwi" (lowercase version of the previous identifiers).
+* A clean-up and harmonization of the indicators metadata has changed some of the indicator identifiers, long_names, abstracts and titles. `xclim.atmos.drought_code` and `fire_weather_indexes` now have identifiers "dc" and "fwi" (lowercase version of the previous identifiers).
 * `xc.indices.run_length.run_length_with_dates` becomes `xc.indices.run_length.season_length`. Its argument `date` is now optional and the default changes from "07-01" to `None`.
 * `xc.indices.consecutive_frost_days` becomes `xc.indices.maximum_consecutive_frost_days`.
 * Changed the `history` indicator output attribute to `xclim_history` in order to respect CF conventions.
@@ -1569,7 +1571,7 @@ v0.14.x (2020-02-21)
 * Refactoring of the documentation.
 * Added support for pint 0.10
 * Add `atmos.heat_wave_total_length` (fixing a namespace issue)
-* Fixes in `utils.percentile_doy` and `indices.winter_rain_ratio` for multidimensionnal datasets.
+* Fixes in `utils.percentile_doy` and `indices.winter_rain_ratio` for multidimensional datasets.
 * Rewrote the `subset.subset_shape` function to allow for dask.delayed (lazy) computation.
 * Added utility functions to compute `time_bnds` when resampling data encoded with `CFTimeIndex` (non-standard calendars).
 * Fix in `subset.subset_gridpoint` for dask array coordinates.

diff --git a/Makefile b/Makefile
@@ -60,6 +60,7 @@ lint: ## check style with flake8 and black
 	nbqa black --check docs
 	blackdoc --check --exclude=xclim/indices/__init__.py xclim
 	blackdoc --check docs
+	codespell xclim tests docs
 	yamllint --config-file=.yamllint.yaml xclim
 
 test: ## run tests quickly with the default Python

diff --git a/docs/installation.rst b/docs/installation.rst
@@ -24,7 +24,7 @@ Anaconda release
 For ease of installation across operating systems, we also offer an Anaconda Python package hosted on conda-forge.
 This version tends to be updated at around the same frequency as the PyPI-hosted library, but can lag by a few days at times.
 
-`xclim` can be installed from conda-forge wth the following:
+`xclim` can be installed from conda-forge with the following:
 
 .. code-block:: shell
 

diff --git a/docs/notebooks/ensembles.ipynb b/docs/notebooks/ensembles.ipynb
@@ -17,8 +17,6 @@
     "\n",
     "from __future__ import annotations\n",
     "\n",
-    "from pathlib import Path\n",
-    "\n",
     "import numpy as np\n",
     "import pandas as pd\n",
     "import xarray as xr\n",
@@ -290,7 +288,7 @@
     "\n",
     "We can then divide the plotted points into categories each with its own hatching pattern, usually leaving the robust data (models agree and enough show a significant change) without hatching. \n",
     "\n",
-    "Xclim provides some tools to help in generating these hatching masks. First is [xc.ensembles.robustness_fractions](../apidoc/xclim.ensembles.rst#xclim.ensembles._robustness.robustness_fractions) that can characterize the change significance and sign agreement accross ensemble members. To demonstrate its usage, we'll first generate some fake annual mean temperature data. Here, `ref` is the data on the reference period and `fut` is a future projection. There are 5 different members in the ensemble. We tweaked the generation so that all models agree on significant change in the \"south\" while agreement and signifiance of change decreases as we go north and east."
+    "Xclim provides some tools to help in generating these hatching masks. First is [xc.ensembles.robustness_fractions](../apidoc/xclim.ensembles.rst#xclim.ensembles._robustness.robustness_fractions) that can characterize the change significance and sign agreement across ensemble members. To demonstrate its usage, we'll first generate some fake annual mean temperature data. Here, `ref` is the data on the reference period and `fut` is a future projection. There are 5 different members in the ensemble. We tweaked the generation so that all models agree on significant change in the \"south\" while agreement and signifiance of change decreases as we go north and east."
    ]
   },
   {

diff --git a/docs/notebooks/partitioning.ipynb b/docs/notebooks/partitioning.ipynb
@@ -10,7 +10,7 @@
     "Here we estimate the sources of uncertainty for an ensemble of climate model projections. The data is the same as used in the [IPCC WGI AR6 Atlas](https://github.com/IPCC-WG1/Atlas). \n",
     "\n",
     "## Fetch data\n",
-    "We'll only fetch a small sample of the full ensemble to illustrate the logic and data structure expected by the partitioning algorith."
+    "We'll only fetch a small sample of the full ensemble to illustrate the logic and data structure expected by the partitioning algorithm."
    ]
   },
   {

diff --git a/docs/notebooks/sdba-advanced.ipynb b/docs/notebooks/sdba-advanced.ipynb
@@ -840,7 +840,7 @@
     "ref_prop = sdba.properties.spell_length_distribution(\n",
     "    da=ref_future, thresh=\"28 degC\", op=\">\", stat=\"mean\", group=\"time.season\"\n",
     ")\n",
-    "# Properties are often associated with the same measures. This correspondance is implemented in xclim:\n",
+    "# Properties are often associated with the same measures. This correspondence is implemented in xclim:\n",
     "measure = sdba.properties.spell_length_distribution.get_measure()\n",
     "measure_sim = measure(sim_prop, ref_prop)\n",
     "measure_scen = measure(scen_prop, ref_prop)\n",

diff --git a/docs/notebooks/sdba.ipynb b/docs/notebooks/sdba.ipynb
@@ -369,7 +369,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# To get an exagerated example we select different points\n",
+    "# To get an exaggerated example we select different points\n",
     "# here \"lon\" will be our dimension of two \"spatially correlated\" points\n",
     "reft = ds.air.isel(lat=21, lon=[40, 52]).drop_vars([\"lon\", \"lat\"])\n",
     "simt = ds.air.isel(lat=18, lon=[17, 35]).drop_vars([\"lon\", \"lat\"])\n",
@@ -570,7 +570,7 @@
     "        base=sdba.QuantileDeltaMapping,  # Use QDM as the univariate adjustment.\n",
     "        base_kws={\"nquantiles\": 20, \"group\": \"time\"},\n",
     "        n_iter=20,  # perform 20 iteration\n",
-    "        n_escore=1000,  # only send 1000 points to the escore metric (it is realy slow)\n",
+    "        n_escore=1000,  # only send 1000 points to the escore metric (it is really slow)\n",
     "    )\n",
     "\n",
     "scenh_npdft = out.scenh.rename(time_hist=\"time\")  # Bias-adjusted historical period\n",

diff --git a/docs/notebooks/usage.ipynb b/docs/notebooks/usage.ipynb
@@ -66,7 +66,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This computation was made using the `growing_degree_days` **indicator**. The same computation could be made through the **index**. You can see how the metadata is alot poorer here."
+    "This computation was made using the `growing_degree_days` **indicator**. The same computation could be made through the **index**. You can see how the metadata is a lot poorer here."
    ]
   },
   {
@@ -202,7 +202,7 @@
     "):\n",
     "    # Change the missing method to \"percent\", instead of the default \"any\"\n",
     "    # Set the tolerance to 10%, periods with more than 10% of missing data\n",
-    "    #     in the input will be masked in the ouput.\n",
+    "    #     in the input will be masked in the output.\n",
     "    gdd = xclim.atmos.growing_degree_days(daily_ds.air, thresh=\"10.0 degC\", freq=\"MS\")\n",
     "gdd"
    ]

diff --git a/environment.yml b/environment.yml
@@ -32,6 +32,7 @@ dependencies:
     - blackdoc
     - bump-my-version
     - cairosvg
+    - codespell
     - coverage
     - distributed >=2.0
     - filelock

diff --git a/pyproject.toml b/pyproject.toml
@@ -60,6 +60,7 @@ dev = [
   "black >=23.3.0",
   "blackdoc",
   "bump-my-version",
+  "codespell",
   "coverage[toml]",
   "flake8",
   "flake8-alphabetize",
@@ -148,6 +149,9 @@ values = [
   "release"
 ]
 
+[tool.codespell]
+skip = 'xclim/data/*.json,docs/_build,docs/notebooks/xclim_training/*.ipynb,docs/references.bib,__pycache__,*.nc,*.png,*.gz,*.whl'
+ignore-words-list = "absolue,astroid,bloc,bui,callendar,degreee,environnement,hanel,inferrable,lond,nam,nd,ressources,vas"
 
 [tool.coverage.run]
 relative_files = true

diff --git a/tests/test_ensembles.py b/tests/test_ensembles.py
@@ -412,7 +412,7 @@ def test_kmeans_variweights(self, open_dataset, random_state):
             make_graph=False,
             variable_weights=var_weights,
         )
-        # Results here may change according to sklearn version, hence the *isin* intead of ==
+        # Results here may change according to sklearn version, hence the *isin* instead of ==
         assert all(np.isin([12, 13, 16], ids))
         assert len(ids) == 6
 

diff --git a/tests/test_indices.py b/tests/test_indices.py
@@ -1465,7 +1465,7 @@ def test_jetstream_metric_woollings(self):
         # Should raise ValueError as longitude is in 0-360 instead of -180.E-180.W
         with pytest.raises(ValueError):
             _ = xci.jetstream_metric_woollings(da_ua)
-        # redefine longitude coordiantes to -180.E-180.W so function runs
+        # redefine longitude coordinates to -180.E-180.W so function runs
         da_ua = da_ua.cf.assign_coords(
             {
                 "X": (
@@ -2888,7 +2888,7 @@ def test_humidex(tas_series):
     # expected values from https://en.wikipedia.org/wiki/Humidex
     expected = np.array([16, 29, 47, 52]) * units.degC
 
-    # Celcius
+    # Celsius
     hc = xci.humidex(tas, dtps)
     np.testing.assert_array_almost_equal(hc, expected, 0)
 

diff --git a/tests/test_sdba/test_base.py b/tests/test_sdba/test_base.py
@@ -89,7 +89,7 @@ def test_grouper_apply(tas_series, use_dask, group, n):
         exp = tas.mean(dim=grouper.dim).expand_dims("group").T
     np.testing.assert_array_equal(out_mean, exp)
 
-    # With additionnal dimension included
+    # With additional dimension included
     grouper = Grouper(group, add_dims=["lat"])
     out = grouper.apply("mean", tas)
     assert out.ndim == 1
@@ -98,7 +98,7 @@ def test_grouper_apply(tas_series, use_dask, group, n):
     assert out.attrs["group_compute_dims"] == [grouper.dim, "lat"]
     assert out.attrs["group_window"] == 1
 
-    # Additionnal but main_only
+    # Additional but main_only
     out = grouper.apply("mean", tas, main_only=True)
     np.testing.assert_array_equal(out, out_mean)
 

diff --git a/tests/test_temperature.py b/tests/test_temperature.py
@@ -275,7 +275,7 @@ def test_TN_3d_data(self, open_dataset):
             ~np.isnan(tnmean).values & ~np.isnan(tnmax).values & ~np.isnan(tnmin).values
         )
 
-        # test maxes always greater than mean and mean alwyas greater than min (non nan values only)
+        # test maxes always greater than mean and mean always greater than min (non nan values only)
         assert np.all(tnmax.values[no_nan] > tnmean.values[no_nan]) & np.all(
             tnmean.values[no_nan] > tnmin.values[no_nan]
         )

diff --git a/tests/test_utils.py b/tests/test_utils.py
@@ -65,7 +65,7 @@ def test_ensure_chunk_size():
 
 class TestNanCalcPercentiles:
     def test_calc_perc_type7(self):
-        # Exemple array from: https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method
+        # Example array from: https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method
         arr = np.asarray([15.0, 20.0, 35.0, 40.0, 50.0])
         res = nan_calc_percentiles(arr, percentiles=[40.0], alpha=1, beta=1)
         # The expected is from R `quantile(arr, probs=c(0.4), type=7)`
@@ -87,7 +87,7 @@ def test_calc_perc_type8(self):
         assert np.all(res[0][1] == 27)
 
     def test_calc_perc_2d(self):
-        # Exemple array from: https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method
+        # Example array from: https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method
         arr = np.asarray(
             [[15.0, 20.0, 35.0, 40.0, 50.0], [15.0, 20.0, 35.0, 40.0, 50.0]]
         )

diff --git a/xclim/core/bootstrapping.py b/xclim/core/bootstrapping.py
@@ -23,7 +23,7 @@ def percentile_bootstrap(func):
 
     This feature is experimental.
 
-    Bootstraping avoids discontinuities in the exceedance between the reference period over which percentiles are
+    Bootstrapping avoids discontinuities in the exceedance between the reference period over which percentiles are
     computed, and "out of reference" periods. See `bootstrap_func` for details.
 
     Declaration example:
@@ -71,12 +71,12 @@ def bootstrap_func(compute_index_func: Callable, **kwargs) -> xarray.DataArray:
     at the beginning and end of the reference period used to calculate percentiles. The bootstrap procedure can reduce
     those discontinuities by iteratively computing the percentile estimate and the index on altered reference periods.
 
-    Theses altered reference periods are themselves built iteratively: When computing the index for year x, the
-    bootstrapping create as many altered reference period as the number of years in the reference period.
-    To build one altered reference period, the values of year x are replaced by the values of another year in the
+    These altered reference periods are themselves built iteratively: When computing the index for year `x`, the
+    bootstrapping creates as many altered reference periods as the number of years in the reference period.
+    To build one altered reference period, the values of year `x` are replaced by the values of another year in the
     reference period, then the index is computed on this altered period. This is repeated for each year of the reference
-    period, excluding year x, The final result of the index for year x, is then the average of all the index results on
-    altered years.
+    period, excluding year `x`. The final result of the index for year `x` is then the average of all the index results
+    on altered years.
 
     Parameters
     ----------