Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want an improved netcdf format (netcdf3) #280

Open
epag opened this issue Aug 21, 2024 · 9 comments
Open

As a user, I want an improved netcdf format (netcdf3) #280

epag opened this issue Aug 21, 2024 · 9 comments

Comments

@epag
Copy link
Collaborator

epag commented Aug 21, 2024


Author Name: James (James)
Original Redmine Issue: 97121, https://vlab.noaa.gov/redmine/issues/97121
Original Date: 2021-10-05


Given a @netcdf2@ format that has some weaknesses (because it attempted to straddle various competing objectives at the time)
When I consider how to improve it
Then I want to consider a @netcdf3@ format

Specific enhancements to be listed.


Redmine related issue(s): 103076


@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-05T15:35:57Z


Must not reduce compatibility when compared w/ netcdf2. Must still work in a recent gdal version and hence ots tools like qgis.

Should probably target cf 1.8.

Need to add wkt geometries for one. Would allow us to represent feature groups properly, as well as other more complex geometries.
Would be nice to have one blob, not many.
Might be nice to add all statistics, but this is probably not straightforward for an array-formatted blob, so nice-to-have, not essential.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-05T15:38:43Z


Anyway, add yer wishlist here.

One of the nice things w/ netcdf when compared to csv2 is that it's a lot less verbose, so I think it has an ongoing user base. It might make more sense to use csv2 in data-frame-shaped applications, but netcdf is a nicer format in many ways for geospatial applications.

Perhaps, one day, we'll have one format that rules them all (edit: user facing, I mean, we already have our canonical format), but I doubt it (because there is a proliferation of geospatial and time-series formats more generally, this is not a wres thing). Perhaps netcdf3 could be a further step along the way, though.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2021-10-05T16:00:51Z


  1. Single blob
  2. Geographic interoperability with recent GDAL (and therefore other tools)
  3. Accurate and precise modeling
  4. Recent-ish CF-conventions adherence
  5. Less cruft
  6. More metadata

Those in order. In other words, if there is a conflict between CF-conventions and interop, interop takes priority.

Edit: I reversed the order of modeling and CF conventions, split "less cruft" into its own item.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-05T16:22:54Z


Jesse wrote:

  1. Single blob
  2. Geographic interoperability with recent GDAL (and therefore other tools)
  3. Accurate and precise modeling
  4. Recent-ish CF-conventions adherence
  5. Less cruft
  6. More metadata

Those in order. In other words, if there is a conflict between CF-conventions and interop, interop takes priority.

Edit: I reversed the order of modeling and CF conventions, split "less cruft" into its own item.

Sounds good to me. The reason for data standards/convention is, in any case, to increase interop, so if the cf convention fails in some way, always side on improved interop for our user base.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-05T16:24:20Z


( #97121 in terms of item 6, more metadata. )

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-05T16:26:09Z


Another thing that would be really nice to fix (but might be hard, I forget - edit: so, I'm not sure if this is bound up in format and hence within scope or tools and hence out-of-scope) is the delayed structure identification. It is a massive pita for our pipeline to bring forward the structure identification before statistics write time (versus incrementing a structure as statistics arrive). edit: that is to say, it makes netcdf a special snowflake among statistics formats, which is never good.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-06T11:43:23Z


Not a feature, but:

  1. Some unit/integration tests.

We can use an in-memory filesystem for this. There are examples for other format writers, like csv2. Essentially, write the file to an in-memory filesystem, then read some or all of it and make assertions against expectations. Would be nice to not rely on reading (esp. for netcdf which cannot be done with a jdk one-liner like csv2), but there is no way around that as a means of establishing what was written.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-26T11:03:01Z


Variable naming is another area for improvement. In netcdf/netcdf2, we qualify the variable names with metadata, which leads to friction when adding newly qualified slices of statistics. The attributes of a variable should fully qualify the statistics within it. A more general naming convention should be adopted for the variables, avoiding threshold and other information and perhaps even the metric name, although this may be helpful for a human user who is trying to visually filter slices in a GIS or some other visualization tool and find the one they want.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-26T11:07:23Z


edit: oops, wrong thread, ignore.

-On building, there's a small number of unit test failures to deal with...-

-For the system tests, scenario003 will fail on assertions, since the graphics titles are now additionally qualified with the ensemble average type, where applicable, and scenario003 is an ensemble evaluation with all valid metrics and graphics benchmarks. I don't anticipate other failures.-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant