Out of bounds data indexing behaviour #291

sadielbartholomew · 2024-03-06T13:44:27Z

Here in cfdm and downstream in cf-python, when attempting to (mistakenly) access data which is out of bounds for the given construct, differing behaviour is produced depending on whether we attempt the __getitem__ on the construct itself, or on the underlying Data accessed e.g. via <construct>.data. See example below. My questions are:

is this what we expect? (my thoughts: not really, consistency would be nice)
is this a facet of Dask under-the-hood and its laziness that we can't change? (may be so, but I am not sure?)
for the case where we error, isn't the error message a bit cryptic relative to something more direct and clear? (my feeling is we should improve the error to make it more akin to an IndexError with error message index X is out of bounds for axis N with size Y as returned by numpy for the same case, e.g. np.array(range(36))[100])

Example (using Python 3.12 and the latest version of cf*)

>>> t = cfdm.example_field(2).dimension_coordinate('time')
>>> t
<DimensionCoordinate: time(36) days since 1959-01-01 >
>>> # Attempt to get item that is out of bounds directly: it errors
>>> t[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/slb93/git-repos/cfdm/cfdm/mixin/propertiesdatabounds.py", line 170, in __getitem__
    new = super().__getitem__(indices)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/slb93/git-repos/cfdm/cfdm/mixin/propertiesdata.py", line 75, in __getitem__
    raise IndexError(
IndexError: Indices (100,) result in a subspaced shape of (0,), but can't create a subspace of DimensionCoordinate that has a size 0 axis
>>> 
>>> # But attempt to access it via the underlying Data, it returns 0 Data
>>> t.data[100]
<Data(0): >

The text was updated successfully, but these errors were encountered:

davidhassell · 2024-03-06T17:34:15Z

Hi Sadie, this is interesting!

is this what we expect? (my thoughts: not really, consistency would be nice)

Yes A pure cf.Data object should behave much like a Dask or numpy array, which can have zero size. However, we do have two deviations from this: never dropping dimensions, and orthogonal indexing. Constructs, however, have constraints, such they can't be scalar, nor have zero size, we need to trap this on them.

is this a facet of Dask under-the-hood and its laziness that we can't change? (may be so, but I am not sure?)

No

or the case where we error, isn't the error message a bit cryptic relative to something more direct and clear? (my feeling is we should improve the error to make it more akin to an IndexError with error message index X is out of bounds for axis N with size Y as returned by numpy for the same case, e.g. np.array(range(36))[100])

Tricky. Consider:

>>> a = t.array
>>> a[100]  # Drops axis and fails
IndexError: index 100 is out of bounds for axis 0 with size 36
>>> a[slice(100,101)]  # Retains axis and returns with zero size
array([], dtype=float64)
>>> a[slice(2, 2)] 
array([], dtype=float64)

This is what is going on with cf.Data - it converts 100 to slice(100, 101) prior to applying to the (numpy or Dask) array. This because we are presuming to keep axes (in cf-python this is tweakable).

So, we don't really know why the array was zero size - was it out of bounds or something else?

In cf-python we can do

>>> t.data[100]
<CF Data(0): >
>>> t.data.__keepdims_indexing__ = False
>>> t.data[100]
IndexError: Index 100 is out of bounds for axis 0 with size 36

because t.data.__keepdims_indexing__ = False prevents 100 from being converted to a size 1 slice.

Does any of that make sense?

sadielbartholomew · 2024-03-07T11:09:08Z

Hi @davidhassell, thanks so much for your detailed explanations here. I think it all makes sense to me (if it doesn't it is because I am trying to wrap my head around, rather than your comment which is very detailed and well explained), though I might ask some follow-up questions in our catch up this afternoon.

sadielbartholomew added the question Further information is requested label Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of bounds data indexing behaviour #291

Out of bounds data indexing behaviour #291

sadielbartholomew commented Mar 6, 2024 •

edited

Loading

davidhassell commented Mar 6, 2024

sadielbartholomew commented Mar 7, 2024

Out of bounds data indexing behaviour #291

Out of bounds data indexing behaviour #291

Comments

sadielbartholomew commented Mar 6, 2024 • edited Loading

Example (using Python 3.12 and the latest version of cf*)

davidhassell commented Mar 6, 2024

sadielbartholomew commented Mar 7, 2024

sadielbartholomew commented Mar 6, 2024 •

edited

Loading