Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need test datasets / files! #33

Open
ChrisBarker-NOAA opened this issue Jun 7, 2024 · 1 comment
Open

We need test datasets / files! #33

ChrisBarker-NOAA opened this issue Jun 7, 2024 · 1 comment
Milestone

Comments

@ChrisBarker-NOAA
Copy link
Collaborator

For the tests, and for development, it's really good to have datasets you can work with without downloading anything.

I think we should have three sets of data / files:

  1. really tiny (probably hard coded in Python code) examples of various file types, metadata types, etc, for the tests. These could be hand written, or borrowed from other projects.

There are some in the gridded project:

https://github.com/NOAA-ORR-ERD/gridded

There are some examples of UGRID and SGRID files and data sets:

https://github.com/NOAA-ORR-ERD/gridded/blob/master/gridded/tests/gen_analytical_datasets.py

https://github.com/NOAA-ORR-ERD/gridded/tree/master/gridded/tests/test_ugrid/files

https://github.com/NOAA-ORR-ERD/gridded/blob/master/gridded/tests/test_pysgrid/write_nc_test_files.py

  1. small-ish real examples -- so far, I've put one UGRID (FVCOM, triangular mesh) example in:

xarray-subset-grid/examples/example_data/SFBOFS_subset1.nc

I set up git LFS support, I think that means all *.nc files will be stored in LFS, so we an put medium sized files in there -- but probably don't want to go more than 10MB or so.

There's a bit of a chicken-egg problem there -- how do you make a small file if you don't yet have a tool to subset larger ones with?

But at this point, we do have some working subset code, so I think we could do:

  • get the subset code working
  • make a small test file with it
  • add that file, and build your more comprehensive tests against it.
  1. larger test files

These could be stored somewhere else, and downloaded on demand -- one option would be in gitHub as releases or packages, or ???

We can also have test code, etc, that points to what we know are stable resources on S3, etc. -- already some of that in the examples.

Maybe have a file in the repo with a set of links to various datasets?

@omkar-334: maybe you could put together a small example or two of ADCIRC (STOFS2D).

@ChrisBarker-NOAA
Copy link
Collaborator Author

Ahh -- and more unit tests would be great -- comprehensive testing is a fantasy, but we can do better than this:

---------- coverage: platform darwin, python 3.12.3-final-0 ----------
Name                                                                                                    Stmts   Miss  Cover
----------------------------------------------------------
array_subset_grid/__init__.py             5      0   100%
array_subset_grid/_version.py            11      2    82%
array_subset_grid/accessor.py            57     17    70%
array_subset_grid/grid.py                31      9    71%
array_subset_grid/grids/__init__.py       2      0   100%
array_subset_grid/grids/sgrid.py         79     65    18%
array_subset_grid/grids/ugrid.py        148     51    66%
array_subset_grid/utils.py               42      9    79%
----------------------------------------------------------
TOTAL                                   375    153    59%                                                                                               375    153    59%

@mpiannucci mpiannucci added this to the HPC Phase 1 milestone Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants