Public testing framework for duck array integration #6894

TomNicholas · 2022-08-08T18:23:49Z

What is your issue?

In #4972 @keewis started writing a public framework for testing the integration of any duck array class in xarray, inspired by the testing framework pandas has for ExtensionArrays. This is a meta-issue for what our version of that framework for wrapping numpy-like duck arrays should look like.

(Feel free to edit / add to this)

What behaviour should we test?

We have a lot of xarray methods to test with any type of duck array. Each of these bullets should correspond to one or more testing base classes which the duck array library author would inherit from. In rough order of increasing complexity:

We don't need to test that the array class obeys everything else in the Array API Standard. (For instance .device is probably never going to be used by xarray directly.) We instead assume that if the array class doesn't implement something in the API standard but all the generated tests pass, then all is well.

How extensible does our testing framework need to be?

To be able to test any type of wrapped array our testing framework needs to itself be quite flexible.

User-defined checking - For some arrays np.testing.assert_equal is not enough to guarantee correctness, so the user creating tests needs to specify additional checks. Automatic duck array testing - reductions #4972 shows how to do this for checking the units of resulting pint arrays.
User-created data? - Some array libraries might need to test array data that is invalid for numpy arrays. I'm thinking specifically of testing wrapping ragged arrays. Awkward array backend? #4285
Parallel computing frameworks? - Related to the last point is chunked arrays. Here the strategy requires an extra chunks argument when the array is created, and any results need to first call .compute(). Testing parallel-executed arrays might also require pretty complicated SetUps and TearDowns in fixtures too. (see also Alternative parallel execution frameworks in xarray #6807)

What documentation / examples do we need?

All of this content should really go on a dedicated page in the docs, perhaps grouped alongside other ways of extending xarray.

Motivation
What subset of the Array API standard we expect duck array classes to define (could point to a typing protocol?)
Explanation that the array type needs to return the same type for any numpy-like function which xarray might call upon that type (i.e. the set of duckarray instances is closed under numpy operations)
Explanation of the different base classes
Simple demo of testing a toy numpy-like array class
Point to code testing more advanced examples we actually use (e.g. sparse, pint)
Which advanced behaviours are optional (e.g. Constructors and Properties have to work, but Groupby is optional)

Where should duck array compatibility testing eventually live?

Right now the tests for sparse & pint are going into the xarray repo, but presumably we don't want tests for every duck array type living in this repository. I suggest that we want to work towards eventually having no array library-specific tests in this repository at all. (Except numpy I guess.) Thanks @crusaderky for the original suggestion.

Instead all tests involving pint could live in pint-xarray, all involving sparse could live in the sparse repository (or a new sparse-xarray repo), etc. etc. We would set those test jobs to re-run when xarray is released, and then xref any issues revealed here if needs be.

We should probably also move some of our existing tests #7023 (review)

The text was updated successfully, but these errors were encountered:

keewis · 2022-08-09T09:33:07Z

with the implementation in #4972 you should already be able to specify a hypothesis strategy to create e.g. a random awkward array. Same with dask or other parallel computing frameworks: if you can construct a hypothesis strategy for them the testing framework should be able to use that. check_reduce (or maybe it should be just check?) should allow customizing the comparison (or actually, that's the entire test code at the moment), so putting compute (or todense / get) calls should be easy.

For setup and teardown I think we could use pytest fixtures (and apply them automatically to each function). However, maybe we should just not use parametrize but instead define separate functions for each reduce operation? Then it would be possible to override that manually. As far as I remember I chose not to do that because tests that only delegate to super().test_function() just are not great design – if we can think of a way to do that while avoiding those kinds of test redefinitions I'd be happy with that (and then we could get rid of the apply_marks function which is a ugly hack of pytest internals).

I agree that moving the array library tests to dedicated repositories makes a lot sense (for example, the pint tests use old versions of the conversion functions from pint-xarray), but note that at the moment the tests for pint seem to increase the total test coverage of xarray a bit. I guess that just means we'd have to improve the rest of the testsuite?

TomNicholas · 2022-08-09T15:17:12Z

you should already be able to specify a hypothesis strategy to create e.g. a random awkward array

Sounds good!

or maybe it should be just check?

Yes just check probably.

However, maybe we should just not use parametrize but instead define separate functions for each reduce operation?

But then the user writing the test code would have to write one of their own tests per xarray method wouldn't they? I think we should avoid putting that much work on them if we can. I think your current approach seems fine so far...

the pint tests use old versions of the conversion functions from pint-xarray

That's basically technical debt, so we should make a point to remove them from xarray eventually.

the tests for pint seem to increase the total test coverage of xarray #5692 (comment). I guess that just means we'd have to improve the rest of the testsuite?

So long as @benbovy (or someone) writes new tests to cover the bugs that were revealed then this is fine.

Illviljan · 2022-08-09T18:52:13Z

Typing duck array is also a little challenging I find, we pretty much only do Any at the moment. I found some nice references and discussions that might be interesting for this:
https://github.com/pmeier/array-protocol
data-apis/array-api#229

TomNicholas · 2022-08-09T18:53:58Z

Typing duck array is also a little challenging I find

Thanks @Illviljan - I was literally just thinking about that here.

TomNicholas · 2022-08-10T05:42:34Z

Another thing that might be useful is the hypothesis strategies in the test suite for the array API consortium standard (cc @keewis).

keewis · 2022-08-16T10:25:43Z

there's also the experimental array api strategies built into hypothesis

jhamman · 2022-09-22T22:35:58Z

@asmeurer recently pointed me to https://data-apis.org/array-api-tests/. Would that be useful here?

TomNicholas · 2022-09-22T23:06:21Z

Looks like these

https://data-apis.org/array-api-tests/.

use these

experimental array api strategies

Would that be useful here?

I think they are complementary. In theory if xarray supports the array API standard and a library passes all the data array API tests, then it should also pass all of xarray's tests (rendering the latter uneccessary). But in practice I think the tests here would still be useful, if only for the possible case of libraries that don't fully meet the API standard yet would still work fine in xarray.

TomNicholas added enhancement topic-testing topic-arrays related to flexible array support labels Aug 8, 2022

TomNicholas mentioned this issue Aug 8, 2022

Automatic duck array testing - reductions #4972

Draft

4 tasks

TomNicholas mentioned this issue Aug 9, 2022

Duckarray tests for constructors and properties #6903

Open

4 tasks

TomNicholas mentioned this issue Aug 9, 2022

Awkward array backend? #4285

Open

TomNicholas mentioned this issue Aug 12, 2022

Public hypothesis strategies for generating xarray data #6911

Open

This was referenced Sep 10, 2022

Generalize handling of chunked array types #7019

Merged

Remove dask_array_type checks #7023

Merged

benbovy mentioned this issue Oct 14, 2022

Idea: Xarray interface Open-EO/openeo-python-client#334

Open

Illviljan mentioned this issue Jan 19, 2023

Typing of internal datatypes #7457

Open

TomNicholas mentioned this issue Feb 9, 2023

Aesara as an array backend in Xarray #7515

Open

TomNicholas mentioned this issue May 1, 2023

Start making unit testing more general #7799

Closed

TomNicholas mentioned this issue May 9, 2023

Initial integration tests cubed-dev/cubed-xarray#4

Closed

TomNicholas mentioned this issue Jul 24, 2023

Test suite cubed-dev/cubed-xarray#6

Closed

TomNicholas added the array API standard Support for the Python array API standard label Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public testing framework for duck array integration #6894

Public testing framework for duck array integration #6894

TomNicholas commented Aug 8, 2022 •

edited

Loading

keewis commented Aug 9, 2022

TomNicholas commented Aug 9, 2022

Illviljan commented Aug 9, 2022

TomNicholas commented Aug 9, 2022

TomNicholas commented Aug 10, 2022

keewis commented Aug 16, 2022

jhamman commented Sep 22, 2022

TomNicholas commented Sep 22, 2022

Public testing framework for duck array integration #6894

Public testing framework for duck array integration #6894

Comments

TomNicholas commented Aug 8, 2022 • edited Loading

What is your issue?

What behaviour should we test?

How extensible does our testing framework need to be?

What documentation / examples do we need?

Where should duck array compatibility testing eventually live?

keewis commented Aug 9, 2022

TomNicholas commented Aug 9, 2022

Illviljan commented Aug 9, 2022

TomNicholas commented Aug 9, 2022

TomNicholas commented Aug 10, 2022

keewis commented Aug 16, 2022

jhamman commented Sep 22, 2022

TomNicholas commented Sep 22, 2022

TomNicholas commented Aug 8, 2022 •

edited

Loading