Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minmax reports strange values, when all the array is masked (2.12) #20

Open
jypeter opened this issue Mar 22, 2018 · 7 comments
Open

minmax reports strange values, when all the array is masked (2.12) #20

jypeter opened this issue Mar 22, 2018 · 7 comments
Labels
kind/bug Categorizes issue related to bug.

Comments

@jypeter
Copy link
Member

jypeter commented Mar 22, 2018

I have come across some slightly anomalous data (see CDAT/cdms#235 for details) that led to a completely masked array. But minmax reports very big values instead of masked

>>> genutil.minmax(U_avg)
(1.7976931348623157e+308, -1.7976931348623157e+308)

2 interesting things to note here:

  • 1.7976931348623157e+308 is the exact max value for IEEE real*8
  • the max reported above, -1.797... is negative and therefore lower than the min...

Should minmax report (np.ma.masked, np.ma.masked) instead ?

Input data for the file

The following lines generated masked data

>>> import cdms2, MV2, cdutil, genutil, numpy as np
>>> f = cdms2.open('/home/scratch01/jypeter/time_counter_bounds_pb.nc')
>>> U = f('U')
>>> f.close()
>>> genutil.minmax(U)
(-31.189350128173828, 23.517915725708008)
>>> U.shape
(10, 1, 90, 180)
>>> 10*90*180
162000
>>> MV2.count(U)
162000
>>> U_avg = cdutil.averager(U, axis='t')
>>> U_avg.shape
(1, 90, 180)
>>> MV2.count(U_avg)
0
>>> genutil.minmax(U_avg)
(1.7976931348623157e+308, -1.7976931348623157e+308)
>>> U_avg.min()
masked
>>> U_avg.max()
masked

Note that I also get the same crazy big values when working with some dummy data

>>> U.dtype
dtype('float32')
>>> U_avg.dtype
dtype('float64')
>>> dummy_ma = np.ma.zeros(U_avg.shape, U_avg.dtype)
>>> MV2.count(dummy_ma)
16200
>>> dummy_ma[...] = np.ma.masked
>>> MV2.count(dummy_ma)
0
>>> genutil.minmax(dummy_ma)
(1.7976931348623157e+308, -1.7976931348623157e+308)
>>> dummy_ma.min()
masked
>>> dummy_ma.max()
masked

Oh, I also get the same crazy big values when working with real*4 data

>>> dummy_ma = np.ma.zeros(U_avg.shape, np.float32)
>>> dummy_ma[...] = np.ma.masked
>>> MV2.count(dummy_ma)
0
>>> dummy_ma.dtype
dtype('float32')
>>> genutil.minmax(dummy_ma)
(1.7976931348623157e+308, -1.7976931348623157e+308)
@durack1
Copy link
Member

durack1 commented Apr 3, 2018

@dnadeau4 I am wondering whether this masking behaviour is related to the other issues we've been having with masks with regridding?

@gleckler1 @doutriaux1 @taylor13

@github-actions
Copy link

Marking issue as stale, since there has been no activity in 30 days.

Unless the issue is updated or the 'stale' tag is removed, this issue will be closed in 7 days.

@github-actions github-actions bot added the stale label Aug 27, 2020
@durack1
Copy link
Member

durack1 commented Aug 27, 2020

@jypeter is this still an issue with CDAT 8.2.1?

@github-actions github-actions bot removed the stale label Aug 27, 2020
@jypeter
Copy link
Member Author

jypeter commented Aug 28, 2020

@durack1 I have not installed 8.2.1 yet (and no time for that now, unfortunately). I have shared again my time_counter_bounds_pb.nc test file, with a valid link

Can somebody give this a try

The strange behavior is still here with the python 3 version of CDAT 8.1:

>>> import cdms2, MV2, cdutil, genutil, numpy as np
>>> f = cdms2.open('/home/scratch01/jypeter/time_counter_bounds_pb.nc')
>>> U = f('U')
>>> f.close()
>>> genutil.minmax(U)
(-31.189350128173828, 23.517915725708008)
>>> MV2.count(U)
162000
>>> U_avg = cdutil.averager(U, axis='t')
/home/share/unix_files/cdat/miniconda3/envs/cdatm_py3/lib/python3.6/site-packages/numpy/ma/core.py:3174: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  dout = self.data[indx]
>>> U_avg.shape
(1, 90, 180)
>>> MV2.count(U_avg)
0
>>> genutil.minmax(U_avg)
(1.7976931348623157e+308, -1.7976931348623157e+308)
>>> U_avg.min()
masked
>>> U_avg.max()
masked

@jasonb5
Copy link
Contributor

jasonb5 commented Aug 31, 2020

I've been able to replicate this with the latest version 8.2.1.

You are correct the returned values in this case are the min and max float32 values. This behavior can be traced back to these lines.

genutil/Lib/minmax.py

Lines 34 to 35 in d0ed149

if count(d) == 0:
return mx, mn

I don't think this is correct behavior as I assume a masked values don't exist. I believe the goal of this function is to accept any number or array of numbers and return a package-agnostic value for min and max, returning np.ma.masked would not work in this case.

I think returning (None, None) would be more appropriate, alternatively we could do (inf, -inf).

@jypeter
Copy link
Member Author

jypeter commented Aug 31, 2020

Thanks for testing this!

You could also return (np.masked, np.masked). Best is to try to stay consistent with numpy. Could you check the return value of U_avg.asma().min() (and .max())

@taylor13
Copy link

taylor13 commented Sep 1, 2020

It seems that the order of the returned values described in #20 (comment) is inconsistent with the order in #20 (comment) , which probably explains why in #20 (comment) the reported max is less than the reported min.

@jasonb5 jasonb5 added the kind/bug Categorizes issue related to bug. label Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue related to bug.
Projects
None yet
Development

No branches or pull requests

4 participants