Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xclim.core.missing should be refactored to use a functional programming approach #2000

Open
1 of 2 tasks
Zeitsperre opened this issue Nov 19, 2024 · 1 comment
Open
1 of 2 tasks
Labels
enhancement New feature or request help wanted Extra attention is needed priority Immediate priority
Milestone

Comments

@Zeitsperre
Copy link
Collaborator

Addressing a Problem?

The xclim.core.missing submodule currently contains several classes and subclasses that are being used to provide functions with shared behaviours, thanks to class inheritance. In practice, this means that a MissingBase can be used to provide a set of common behaviours, which are then used for different types of Missing checks, whose docstrings and methods are used in order to build:

  • missing_any()
  • missing_wmo()
  • missing_pct()
  • at_least_n_valid()
  • missing_from_context()

There are a few issues with this approach:

  • The current implementation sees unequal adoption of the standards of subclasses (e.g. calling super().__init__()) as in most cases, this isn't needed.
  • The call signatures of the subclasses don't necessarily match those of their constructors or parent class (use of **kwargs for some; some variables listed but not used, etc.).
  • As class docstrings are being used to generate function docstrings, documentation checking tools can't effectively be used to ensure consistency (lots of false positives and typical standards need to be ignored).

I can't say for certain, but initializing a bunch of classes and subclasses to mimic a functional programming approach is probably more computationally taxing than adopting a strict functional programming approach.

Potential Solution

This module could use a full refactoring to adopt functional approaches.

Additional context

Issue raised from work done in #1988.

Contribution

  • I would be willing/able to open a Pull Request to contribute this feature.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Zeitsperre Zeitsperre added the enhancement New feature or request label Nov 19, 2024
@coxipi
Copy link
Contributor

coxipi commented Nov 19, 2024

Sarah was also discussing (I think) how pre-processing such as jitter_under_thresh could also adopt this approach, although I'm not sure we want to go there. I think that pre-processing data in this way is a bit different, but shares some simliarities with the idea of pre-checks before doing computation.

I feel this might be a Pandora's box in some way, but I can't really tell where things could go wrong. At least for tests, it's maybe a bit more difficult to evaluate the coverage of such functionalities. If it has the potential of greatly simplifying some things, I'd say it's worth a shot.

@Zeitsperre Zeitsperre mentioned this issue Nov 19, 2024
5 tasks
@Zeitsperre Zeitsperre added this to the v0.55.0 milestone Dec 10, 2024
@Zeitsperre Zeitsperre added the help wanted Extra attention is needed label Dec 10, 2024
Zeitsperre added a commit that referenced this issue Dec 10, 2024
### What kind of change does this PR introduce?

* Integrates the `numpydoc` library for docstring validation.
* Fixes the `codespell` configuration (previously was not parsing all
relevant files).
* Drops Python3.9 coding conventions everywhere.
* Reformats existing docstrings using `\*` to use `*` instead.
* On a related note, disabled `flake8`'s `RST210` and `RST213` checks
that were flagging `*` symbols in docstrings as unterminated *emphasis*
and **strong** tags.
* Adds a step to the `Indictor.parse_indice` that sanitizes the `*`
symbols from `Parameters` entries.
* Painstakingly modifies all docstrings to be closer to the `numpydoc`
specification.

### Does this PR introduce a breaking change?

It should not. Call signatures and docstrings will be slightly changed.

### Other information:

https://numpydoc.readthedocs.io/en/latest/validation.html

Except for the custom fields that we use to populate our documentation,
the docstrings of `xclim` are very consistent and virtually compliant
with the standard now.

Both `xclim.sdba` and `xclim.core.missing` are not evaluated by
`numpydoc` due to major refactoring efforts that are underway
(https://github.com/Ouranosinc/xsdba) or needed (#2000).
@Zeitsperre Zeitsperre added the priority Immediate priority label Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed priority Immediate priority
Projects
None yet
Development

No branches or pull requests

2 participants