-
Notifications
You must be signed in to change notification settings - Fork 8
Move general functionality upstream #157
Comments
Yeah +1 on moving things upstream. From a quick look, it seems like esmlab basically adds time-aware weighted groupby/resampling; time bounds consistency; and time units/calendargpropagation (pydata/xarray#1614). For the last one, it would be good to document how much of the table in pydata/xarray#1614 is implemented and what's left. Originally posted by @dcherian in #156 (comment) |
This is correct. We are relying on the time bounds for all time-aware operations.
In [1]: import xarray as xr
In [2]: import numpy as np
...: ds['variable_1'] = xr.DataArray(
...: np.append(
...: np.zeros([12, 2, 2], dtype='float32'), np.ones([12, 2, 2], dtype='float32'), axis=0
...: ),
...: dims=['time', 'lat', 'lon'],
...: )
...: ds['variable_2'] = xr.DataArray(
...: np.append(
...: np.ones([12, 2, 2], dtype='float32'), np.zeros([12, 2, 2], dtype='float32'), axis=0
...: ),
...: dims=['time', 'lat', 'lon'],
...: )
...: ds.time.attrs['units'] = 'days since 0001-01-01 00:00:00'
...: ds.time.attrs['calendar'] = 'noleap'
...: ds.time.attrs['bounds'] = 'time_bound'
...: return ds.copy(True)
...:
In [4]: ds = create_dataset()
In [5]: ds
Out[5]:
<xarray.Dataset>
Dimensions: (d2: 2, lat: 2, lon: 2, time: 24)
Coordinates:
* lat (lat) int64 0 1
* lon (lon) int64 0 1
* time (time) float64 31.0 59.0 90.0 120.0 ... 638.0 669.0 699.0 730.0
* d2 (d2) int64 0 1
Data variables:
time_bound (time, d2) float64 0.0 31.0 31.0 59.0 ... 699.0 699.0 730.0
variable_1 (time, lat, lon) float32 0.0 0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0
variable_2 (time, lat, lon) float32 1.0 1.0 1.0 1.0 1.0 ... 0.0 0.0 0.0 0.0
In [6]: xr.decode_cf(ds)
Out[6]:
<xarray.Dataset>
Dimensions: (d2: 2, lat: 2, lon: 2, time: 24)
Coordinates:
* lat (lat) int64 0 1
* lon (lon) int64 0 1
* time (time) object 0001-02-01 00:00:00 ... 0003-01-01 00:00:00
* d2 (d2) int64 0 1
Data variables:
time_bound (time, d2) object ...
variable_1 (time, lat, lon) float32 ...
variable_2 (time, lat, lon) float32 ... Notice how the time axis is shifted to the right by one month. Ideally, you would want the time to be: In [10]: import cftime
In [14]: cftime.num2date(ds.time_bound.mean(dim='d2'), units='days since 0001-01-01 00:00:00', calendar='noleap')
Out[14]:
array([cftime.DatetimeNoLeap(0001-01-16 12:00:00),
cftime.DatetimeNoLeap(0001-02-15 00:00:00),
cftime.DatetimeNoLeap(0001-03-16 12:00:00),
cftime.DatetimeNoLeap(0001-04-16 00:00:00),
cftime.DatetimeNoLeap(0001-05-16 12:00:00),
cftime.DatetimeNoLeap(0001-06-16 00:00:00),
cftime.DatetimeNoLeap(0001-07-16 12:00:00),
cftime.DatetimeNoLeap(0001-08-16 12:00:00),
cftime.DatetimeNoLeap(0001-09-16 00:00:00),
cftime.DatetimeNoLeap(0001-10-16 12:00:00),
cftime.DatetimeNoLeap(0001-11-16 00:00:00),
cftime.DatetimeNoLeap(0001-12-16 12:00:00),
cftime.DatetimeNoLeap(0002-01-16 12:00:00),
cftime.DatetimeNoLeap(0002-02-15 00:00:00),
cftime.DatetimeNoLeap(0002-03-16 12:00:00),
cftime.DatetimeNoLeap(0002-04-16 00:00:00),
cftime.DatetimeNoLeap(0002-05-16 12:00:00),
cftime.DatetimeNoLeap(0002-06-16 00:00:00),
cftime.DatetimeNoLeap(0002-07-16 12:00:00),
cftime.DatetimeNoLeap(0002-08-16 12:00:00),
cftime.DatetimeNoLeap(0002-09-16 00:00:00),
cftime.DatetimeNoLeap(0002-10-16 12:00:00),
cftime.DatetimeNoLeap(0002-11-16 00:00:00),
cftime.DatetimeNoLeap(0002-12-16 12:00:00)], dtype=object) Is this something that is reasonable to implement in xarray or is too domain-specific?
In [19]: ds.resample(time='A').mean()
Out[19]:
<xarray.Dataset>
Dimensions: (d2: 2, lat: 2, lon: 2, time: 3)
Coordinates:
* time (time) object 0001-12-31 00:00:00 ... 0003-12-31 00:00:00
* lat (lat) int64 0 1
* d2 (d2) int64 0 1
* lon (lon) int64 0 1
Data variables:
variable_1 (time, lat, lon) float32 0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0
variable_2 (time, lat, lon) float32 1.0 1.0 1.0 1.0 ... 0.0 0.0 0.0 0.0
In [26]: esmlab.resample(create_dataset(), freq='ann')
Out[26]:
<xarray.Dataset>
Dimensions: (d2: 2, lat: 2, lon: 2, time: 2)
Coordinates:
* lat (lat) int64 0 1
* lon (lon) int64 0 1
* time (time) float64 181.7 546.7
* d2 (d2) int64 0 1
Data variables:
time_bound (d2, time) float64 0.0 365.0 365.0 730.0
variable_1 (time, lat, lon) float64 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
variable_2 (time, lat, lon) float64 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 Is it reasonable to have this in xarray or is domain specific (i.e. should we handle this in esmlab?) ? |
#142 proposes a method to generate the new time-coordinate. |
Some of this might get solved eventually (pydata/xarray#1475) when xarray adds more fancy indexes but that's a longer term project. I should be clear: esmlab adds very useful capabilities and a package like it is definitely required. It's just that general things like machinery for weighted operations should be moved upstream. |
My bad. I partially pasted the code. Here's the complete version: import xarray as xr
import numpy as np
def create_dataset():
start_date = np.array([0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334], dtype=np.float64)
start_date = np.append(start_date, start_date + 365)
end_date = np.array([31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365], dtype=np.float64)
end_date = np.append(end_date, end_date + 365)
ds = xr.Dataset(coords={'time': 24, 'lat': 2, 'lon': 2, 'd2': 2})
ds['time'] = xr.DataArray(end_date, dims='time')
ds['lat'] = xr.DataArray([0, 1], dims='lat')
ds['lon'] = xr.DataArray([0, 1], dims='lon')
ds['d2'] = xr.DataArray([0, 1], dims='d2')
ds['time_bound'] = xr.DataArray(
np.array([start_date, end_date]).transpose(), dims=['time', 'd2']
)
ds['variable_1'] = xr.DataArray(
np.append(
np.zeros([12, 2, 2], dtype='float32'), np.ones([12, 2, 2], dtype='float32'), axis=0
),
dims=['time', 'lat', 'lon'],
)
ds['variable_2'] = xr.DataArray(
np.append(
np.ones([12, 2, 2], dtype='float32'), np.zeros([12, 2, 2], dtype='float32'), axis=0
),
dims=['time', 'lat', 'lon'],
)
ds.time.encoding['units'] = 'days since 0001-01-01 00:00:00'
ds.time.encoding['calendar'] = 'noleap'
ds.time.encoding['bounds'] = 'time_bound'
return ds |
I agree with you. This is a CESM issue and not a general issue, and it makes more sense to address it at the source. |
Or we could just have a separate minimal package to handle the strange behaviors/inconsistencies of CESM output that are problematic when using xarray. |
pydata/xarray#2922 could use some testing if someone's got the time |
@dcherian, I took pydata/xarray#2922 for a test drive. From the results I am getting, it looks good to me: |
While working on #156 and per discussion with @dcherian, I learned that esmlab's codebase is predominantly full of hacks.
I myself have started having a hard time understanding the existing codebase. It's been a while since I looked at esmlab's code base. So, I spent the last four days looking at the existing code base, and my take-away is that esmlab's codebase is full of hacks. And this makes it difficult to debug or even maintain :(
I am personally in favor of helping with pushing most of the general functionality into xarray and only keeping things that are domain-specific. I just found out that there has been recent progress in pydata/xarray#2922, and I am excited about this. Once pydata/xarray#2922 is merged, we can move most if not all existing functionality from https://github.com/NCAR/esmlab/blob/master/esmlab/statistics.py into xarray.
Originally posted by @andersy005 in #156 (comment)
The text was updated successfully, but these errors were encountered: