Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF coordinates in parent group is not used when reading sub group #1982

Open
jacklovell opened this issue Mar 12, 2018 · 10 comments
Open

Comments

@jacklovell
Copy link

Code Sample, a copy-pastable example if possible

ncfile_cf = "x07z00017_cf.nc"
with xr.open_dataset(ncfile_cf, group="x07") as ds:
    ds_data_cf = ds.copy(deep=True)
print(ds_data_cf)
     
<xarray.Dataset>
Dimensions:  (time1: 100000, time2: 2)
Dimensions without coordinates: time1, time2
Data variables:
    aps      (time1) float64 ...
    iact     (time1) float64 ...
    vact     (time1) float64 ...
    dps      (time1) float64 ...
    tss      (time2) float64 ...

Problem description

When reading a sub group from a netCDF file with dimensions defined in the root group, the dimensions are not read from the root group. This contradicts the netCDF documentation, which states that dimensions are scoped such that they can be seen by all sub groups.

The attached netCDF file demonstrates this issue.
x07z00017_cf.nc.zip

Expected Output

The dimensions from the root group should be used when reading the sub-group.

with xr.open_dataset(ncfile_cf) as ds:
     for coord in ds.coords:
         ds_data_cf.coords[coord] = ds[coord]
print(ds_data_cf)

<xarray.Dataset>
Dimensions:  (time1: 100000, time2: 2)
Coordinates:
  * time1    (time1) float64 0.0 1e-06 2e-06 3e-06 4e-06 5e-06 6e-06 7e-06 ...
  * time2    (time2) float64 0.0 0.1
Data variables:
    aps      (time1) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 ...
    iact     (time1) float64 -0.00125 -0.000625 -0.00125 -0.0009375 ...
    vact     (time1) float64 -0.009375 -0.009375 -0.01875 -0.01875 -0.009375 ...
    dps      (time1) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    tss      (time2) float64 0.0 0.0

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

xarray: 0.10.0
pandas: 0.22.0
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: None
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.15.3
matplotlib: 2.1.0
cartopy: None
seaborn: 0.8.0
setuptools: 38.5.1
pip: 9.0.1
conda: 4.4.11
pytest: 3.2.1
IPython: 6.2.1
sphinx: 1.6.3

@jhamman
Copy link
Member

jhamman commented Mar 12, 2018

This seems to be a duplicate of #1092. The short answer is that we don't have a first class solution to working with groups in netCDF files.

@jhamman jhamman changed the title NetCDF dimension in parent group is not used when reading sub group NetCDF coordinates in parent group is not used when reading sub group Mar 12, 2018
@jacklovell
Copy link
Author

It looks to me like #1092 is about a Dataset-like object which can contain groups and sub-groups. Here we have a simpler issue: the Dataset can still be a flat object containing a single group, but it should respect the scope of netCDF dimensions. This means that any dimensions which are mentioned but not visible in the group being written should be searched for and copied (linked?) from a parent group, up to and including the root group if the dimensions reside there.

@jhamman
Copy link
Member

jhamman commented Mar 12, 2018

but it should respect the scope of netCDF dimensions

Just so we're all using the same terminology, you are actually referring to "coordinates", not dimensions. In your example above, tss is getting the correct dimension (time2) but the corresponding coordinate (time2) is not loaded. In netCDF/xarray dimensions are just names for individual axes.

The fundamental issue is that we don't have any machinery in xarray to look outside of a single netCDF group when opening a dataset. That is the common piece to #1092.

@Chiil
Copy link

Chiil commented Nov 12, 2018

The problem occurs with plotting as well. See the StackOverflow question that I posted a few days ago.

https://stackoverflow.com/questions/53196437/missing-axis-values-in-plotting-of-netcdf-variable-in-group-with-xarray

@shoyer
Copy link
Member

shoyer commented Nov 12, 2018

When reading a sub group from a netCDF file with dimensions defined in the root group, the dimensions are not read from the root group. This contradicts the netCDF documentation, which states that dimensions are scoped such that they can be seen by all sub groups.

To be clear, xarray does properly read "dimensions" in parent groups. (This is actually ensured by libraries like netCDF4-Python.)

What xarray doesn't do is read "coordinates" from parent groups. As far as I can tell, this isn't part of either the netCDF4 data model or CF conventions. This might be a usability improvement but the right way to do isn't dictated by the specs.

@Chiil
Copy link

Chiil commented Nov 12, 2018 via email

@shoyer
Copy link
Member

shoyer commented Nov 12, 2018

I think it could logically make sense to recursively check parent groups for coordinates referenced by variables in the opened group.

From an implementation perspective, this might be a little tricky. The current backend interface isn't really setup for this. Currently, we open all variables, and do all CF convention decoding afterwards. This would require going back to the dataset to open more variables.

@lamorton
Copy link

I'm currently working around this by loading the root group & the branch group with two separate calls and then merging the resulting datasets. It's ugly b/c I have to manually associate the 'phony_dim_x' dimensions from one group with the other.

Maybe I can find the time during quarantine to make an attempt at resolving #1092, which I think would facilitate resolving this issue as well.

Another option would be to allow the group kwarg to be a tuple of group names, and load_dataset could yield a (flat) Dataset including both the root and the branch variables.

@shoyer
Copy link
Member

shoyer commented Apr 10, 2020 via email

@alexamici
Copy link
Collaborator

alexamici commented Dec 27, 2021

Note that current CF-1.9 now specifies how to read coordinates from any group in file (see spec).

If it is not already one of the goals of the new DataTree structure #4665 it would probably best be added. cc @TomNicholas @aurghs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants