You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here in cfdm and downstream in cf-python, when attempting to (mistakenly) access data which is out of bounds for the given construct, differing behaviour is produced depending on whether we attempt the __getitem__ on the construct itself, or on the underlying Data accessed e.g. via <construct>.data. See example below. My questions are:
is this what we expect? (my thoughts: not really, consistency would be nice)
is this a facet of Dask under-the-hood and its laziness that we can't change? (may be so, but I am not sure?)
for the case where we error, isn't the error message a bit cryptic relative to something more direct and clear? (my feeling is we should improve the error to make it more akin to an IndexError with error message index X is out of bounds for axis N with size Y as returned by numpy for the same case, e.g. np.array(range(36))[100])
Example (using Python 3.12 and the latest version of cf*)
>>>t=cfdm.example_field(2).dimension_coordinate('time')
>>>t<DimensionCoordinate: time(36) dayssince1959-01-01>>>># Attempt to get item that is out of bounds directly: it errors>>>t[100]
Traceback (mostrecentcalllast):
File"<stdin>", line1, in<module>File"/home/slb93/git-repos/cfdm/cfdm/mixin/propertiesdatabounds.py", line170, in__getitem__new=super().__getitem__(indices)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^File"/home/slb93/git-repos/cfdm/cfdm/mixin/propertiesdata.py", line75, in__getitem__raiseIndexError(
IndexError: Indices (100,) resultinasubspacedshapeof (0,), butcan'tcreateasubspaceofDimensionCoordinatethathasasize0axis>>>>>># But attempt to access it via the underlying Data, it returns 0 Data>>>t.data[100]
<Data(0): >
The text was updated successfully, but these errors were encountered:
is this what we expect? (my thoughts: not really, consistency would be nice)
Yes A pure cf.Data object should behave much like a Dask or numpy array, which can have zero size. However, we do have two deviations from this: never dropping dimensions, and orthogonal indexing. Constructs, however, have constraints, such they can't be scalar, nor have zero size, we need to trap this on them.
is this a facet of Dask under-the-hood and its laziness that we can't change? (may be so, but I am not sure?)
No
or the case where we error, isn't the error message a bit cryptic relative to something more direct and clear? (my feeling is we should improve the error to make it more akin to an IndexError with error message index X is out of bounds for axis N with size Y as returned by numpy for the same case, e.g. np.array(range(36))[100])
Tricky. Consider:
>>>a=t.array>>>a[100] # Drops axis and failsIndexError: index100isoutofboundsforaxis0withsize36>>>a[slice(100,101)] # Retains axis and returns with zero sizearray([], dtype=float64)
>>>a[slice(2, 2)]
array([], dtype=float64)
This is what is going on with cf.Data - it converts 100 to slice(100, 101) prior to applying to the (numpy or Dask) array. This because we are presuming to keep axes (in cf-python this is tweakable).
So, we don't really know why the array was zero size - was it out of bounds or something else?
Hi @davidhassell, thanks so much for your detailed explanations here. I think it all makes sense to me (if it doesn't it is because I am trying to wrap my head around, rather than your comment which is very detailed and well explained), though I might ask some follow-up questions in our catch up this afternoon.
Here in cfdm and downstream in cf-python, when attempting to (mistakenly) access data which is out of bounds for the given construct, differing behaviour is produced depending on whether we attempt the
__getitem__
on the construct itself, or on the underlyingData
accessed e.g. via<construct>.data
. See example below. My questions are:IndexError
with error messageindex X is out of bounds for axis N with size Y
as returned bynumpy
for the same case, e.g.np.array(range(36))[100]
)Example (using Python 3.12 and the latest version of cf*)
The text was updated successfully, but these errors were encountered: