Test suite, CLI #74

cpaniaguam · 2024-01-19T22:26:48Z

Benefits:

More granular tests
Parallel execution

hollandjg

I think what you're doing is parallelizing the tests by using subprocess dispatch from within python, and you're including results from one run into the code base as a reference. Then you're checking for the existence of the output files. Is that right?

I've a couple of questions here:

Might these different processes interfere with each other's inputs or outputs?
Are you sure that storing the results of a run in the repo will be a sustainable approach to testing? The risk here is that your test cases might become dependent on precisely that format, and precisely those results, so updating the test cases requires adding new large files, and adding new tests requires adding even more new large files.
Do the GitHub actions runners actually have multiple CPUs which you can use? If not, is this more for accelerating your local tests?

A suggestion to think about: Another way you could parallelize this whole thing is by setting up the GitHub actions jobs so that they can run in parallel, and leverage the OS (rather than python) to do this. There's an example here: https://medium.com/tradeling/how-to-achieve-parallel-execution-using-github-actions-d534404702fb

hollandjg · 2024-01-22T13:37:48Z

tests/plots.tar.gz

Can you say something about this file? What's its job?

The scripts produce two directories of output files, plots and work. Currently, this is the expected structure for those directories after running the first 5 steps.

➜ icesat2-tracks git:(feat-build-test-suite) ✗ tree plots plots └── SH └── SH_testSLsinglefile2 └── SH_20190502_05180312 ├── A02_SH_2_hindcast_data.pdf ├── A02_SH_2_hindcast_prior.pdf ├── B01_track.png.png ├── B01b_ATL06_corrected.png.png ├── B01b_beam_statistics.png.png ├── B03_specs_L25000.0.png ├── B03_specs_coord_check.png ├── B03_spectra │ ├── B03_freq_reconst_x28.pdf │ ├── B03_freq_reconst_x47.pdf │ ├── B03_freq_reconst_x56.pdf │ ├── B03_freq_reconst_x66.pdf │ └── B03_freq_reconst_x75.pdf ├── B03_success.json ├── B04_data_avail.pdf ├── B04_marginal_distributions.pdf ├── B04_prior_angle.png └── B04_success.json

➜ icesat2-tracks git:(feat-build-test-suite) ✗ tree work work ├── SH_testSLsinglefile2 │ ├── A01b_ID │ │ └── A01b_ID_SH_20190502_05180312.json │ ├── A02_prior │ │ ├── A02_SH_20190502_05180312.h5 │ │ └── A02_SH_20190502_05180312_hindcast_success.json │ ├── B01_regrid │ │ └── SH_20190502_05180312_B01_binned.h5 │ ├── B02_spectra │ │ ├── B02_SH_20190502_05180312_FFT.nc │ │ ├── B02_SH_20190502_05180312_gFT_k.nc │ │ ├── B02_SH_20190502_05180312_gFT_x.nc │ │ └── B02_SH_20190502_05180312_params.h5 │ └── B04_angle │ ├── B04_SH_20190502_05180312_marginals.nc │ └── B04_SH_20190502_05180312_res_table.h5 └── processed

The spirit is to have the tests run more robustly, independently of each other, and free of external data sources. At the moment sliderule, used to get the input datasets, is down/not-working-properly/working-erratically (see this).

hollandjg · 2024-01-22T13:38:23Z

pyproject.toml

+  "emcee==3.1.4",
+  "siphon"


Can you say something about these new dependencies? I don't see them used in any of the python files.

They are used in step 5.

hollandjg · 2024-01-22T13:40:53Z

pyproject.toml

There's some curious highlighting happening here. Looks like a syntax error, but I can't see where it's coming from. Any ideas?

I am not sure as my linter is not reporting an issue for this file. I think I am using ruff.

hollandjg · 2024-01-22T13:45:22Z

tests/test_steps.py

Can you say something about what this is doing? I think a few comments might be useful here!

This is running the steps (1-5 so far) of the processing pipeline. Might need to split this into more than one file after the process that fetches external data is isolated.

cpaniaguam · 2024-01-22T17:16:54Z

I think what you're doing is parallelizing the tests by using subprocess dispatch from within python, and you're including results from one run into the code base as a reference. Then you're checking for the existence of the output files. Is that right?

Correct! I am not completely sure how pytest handles the parallelization, but that's probably the case.

Might these different processes interfere with each other's inputs or outputs?

I don't think so, especially if the tests are run in different processes. As things stand at the moment, this will require some refactoring of the scripts so that the source of the input data can be specified interactively.

Are you sure that storing the results of a run in the repo will be a sustainable approach to testing? The risk here is that your test cases might become dependent on precisely that format, and precisely those results, so updating the test cases requires adding new large files, and adding new tests requires adding even more new large files.

Yes, I think this is a viable approach. The format is dependent on the input data and the scripts used to generate the files. If the input data is fixed, the results should be the same as long as the scripts are not changed in ways that modify their output. If keeping relatively large data files in the repo is a concern, these could be moved to LFS. The actual problem we are having now, which is what motivated this PR, is the dependency on the external service to gather input data, which is not working at the moment.

Do the GitHub actions runners actually have multiple CPUs which you can use? If not, is this more for accelerating your local tests?

Yes, they do (see here). Linux runners in particular have 4 cpus.

A suggestion to think about: Another way you could parallelize this whole thing is by setting up the GitHub actions jobs so that they can run in parallel, and leverage the OS (rather than python) to do this. There's an example here: https://medium.com/tradeling/how-to-achieve-parallel-execution-using-github-actions-d534404702fb

Thanks for the suggestion! I'll look into it 👍

Files changed for steps 1-4

This commit adds the .gitattributes file with a filter for *.tar.gz files and updates the .gitignore file to include /logs, localci-runner.sh, and tests/test_tarballs.py.

TODO: make it work in parallel

input parameter reporting, update mconfig paths with outputdir

changed B01_SL_load_single_file.py

fix other linting errors

cpaniaguam · 2024-01-29T18:27:59Z

.gitattributes

This handles the tar.gz files with lfs.

cpaniaguam · 2024-01-29T18:42:27Z

.github/workflows/run_tests.yml

+      - name: install pytest
+        run: pip install pytest pytest-xdist
+      - name: Run tests
+        run: pytest --capture=sys --verbose --showlocals --tb=long --durations=5 --numprocesses=4 tests/test_steps.py


For those new to the pytest framework, here is a breakdown of the command that runs the tests:

pytest: This is the command to run the pytest testing framework.

--capture=sys: This option tells pytest to capture stdout and stderr at the sys level. If your tests print to stdout or stderr, those prints will be captured and only displayed in the event of a test failure.

--verbose or -v: This option enables verbose output, which means pytest will print more information about each test that is run.

--showlocals or -l: This option shows local variables in tracebacks.

--tb=long: This option controls the format of the traceback that pytest prints when a test fails. long is the most detailed format.

--durations=5: This option tells pytest to record the duration of each test, and at the end of the test session, list the 5 slowest tests.

--numprocesses=4 or -n 4: This option, provided by the pytest-xdist plugin, tells pytest to run tests in parallel with 4 workers.

tests/test_steps.py: This is the path to the test file that pytest should run.

cpaniaguam · 2024-01-29T19:05:25Z

.github/workflows/run_tests.yml

+      - name: install pytest
+        run: pip install pytest pytest-xdist pytest-sugar pytest-durations
+      - name: Run tests
+        run: pytest --capture=sys --verbose --showlocals --tb=long --numprocesses=auto tests/test_steps.py


This command is used to run pytest, a testing framework for Python. Here's a breakdown of each part:

pytest: This is the command to run the pytest testing framework.

--capture=sys: This option tells pytest to capture stdout and stderr at the sys level. If your tests print to stdout or stderr, those prints will be captured and only displayed in the event of a test failure.

--verbose or -v: This option enables verbose output, which means pytest will print more information about each test that is run.

--showlocals or -l: This option shows local variables in tracebacks.

--tb=long: This option controls the format of the traceback that pytest prints when a test fails. long is the most detailed format.

--numprocesses=auto or -n auto: This option, provided by the pytest-xdist plugin, tells pytest to run tests in parallel with a number of workers equal to the number of available CPUs.

tests/test_steps.py: This is the path to the test file that pytest should run.

cpaniaguam · 2024-01-29T19:05:45Z

.github/workflows/test-B01_SL_load_single_file.yml

Workflow renamed.

cpaniaguam · 2024-01-29T19:06:45Z

hooks/pre-push

Feel free to use this hook in your local development environment.

cpaniaguam · 2024-01-29T19:07:34Z

src/icesat2_tracks/ICEsat2_SI_tools/generalized_FT.py

Cleanups and explicit imports.

cpaniaguam · 2024-01-29T19:07:56Z

src/icesat2_tracks/analysis_db/B01_SL_load_single_file.py

cpaniaguam · 2024-01-29T19:08:10Z

src/icesat2_tracks/analysis_db/B02_make_spectra_gFT.py

cpaniaguam · 2024-01-29T19:08:21Z

src/icesat2_tracks/analysis_db/B03_plot_spectra_ov.py

cpaniaguam · 2024-01-29T19:08:51Z

src/icesat2_tracks/app.py

CLI tool for the whole package.

cpaniaguam · 2024-01-29T19:09:41Z

tests/plots.tar.gz

Managed by lfs (deprecated)

hollandjg

Hi Carlos! Thanks for this PR!

Here my feedback in text form – we just discussed everything here.

It's pretty big, and it looks like you're changing a few things together which don't necessarily need to be in one PR (like the code cleanups in generalized_FT and io, adding the CLI to the analysis_db files and adding the test cases).

Please could you split this into some smaller PRs? We discussed:

one for each file where you've modified the CLI – just refactoring the existing calls in the existing test suite to keep everything working
one for each set of cleanups which are independent of the CLI
one for the new test suite as a whole

That'll mean that we can review each piece independently, which will help reduce the cognitive load so we don't miss anything and speed up the review process too.

Now you know how they all fit together and that they do work this way, it shouldn't be too tricky to move the individual moving parts to new branches.

If you'd like to chat through splitting up the PR, just let me know – I've had to do it a few times in the last year on AutoRA and Project Portal.

kmilo9999 · 2024-01-31T16:06:49Z

This looks awesome @cpaniaguam Great Job.

cpaniaguam added 2 commits January 19, 2024 17:22

feat: add test workflow and update dependencies

e3dfde3

fix: python version to 3.9

4ef1861

cpaniaguam requested review from hollandjg and kmilo9999 January 19, 2024 22:27

cpaniaguam added 2 commits January 20, 2024 19:39

Merge branch 'main' into feat-build-test-suite

8d895e3

Update pytest command in run_tests.yml

3b8d17b

hollandjg reviewed Jan 22, 2024

View reviewed changes

cpaniaguam changed the base branch from main to feat-step5-intg January 22, 2024 20:42

cpaniaguam changed the base branch from feat-step5-intg to main January 22, 2024 21:45

cpaniaguam added 20 commits January 23, 2024 16:27

fix: use pathlib.Path to handle paths

0892d15

Files changed for steps 1-4

test: extract input data files

b6462ed

feat: add .gitattributes and update .gitignore

4bbb2d6

This commit adds the .gitattributes file with a filter for *.tar.gz files and updates the .gitignore file to include /logs, localci-runner.sh, and tests/test_tarballs.py.

test: suite working serially

19a66dd

TODO: make it work in parallel

Remove print statements for CLI usage in color class

bc866ff

chore: clean up 2021_IceSAT2_startup_local.py

b21d787

feat: add cli B01_SL_load_single_file.py

d647473

feat: clitools and cli for B01_SL_load_single_file

01ea5dc

update: clitools

03a7322

input parameter reporting, update mconfig paths with outputdir

fix: leverege updates in clitools

2c9f998

changed B01_SL_load_single_file.py

chore: move inputs to the top

b01f076

fix other linting errors

feat: add cli to B02_make_spectra

7a59797

feat: cli for B03_plot_spectra_ov

fb49304

feat: cli for A02c_IOWAGA

572be4a

feat: tests running in parallel

2c66eb8

test: add test data files for steps 2-5

5dfbd32

chore: refactor delete_pdf_files

0601105

feat: add pre-push hook to run tests before pushing changes

1eb2574

fix: pytest command to run tests in parallel

8152cda

feat: update dependencies and add scripts to pyproject.toml

784c69e

cpaniaguam added 8 commits January 28, 2024 01:33

fix: pyproject.toml

cc2f0ac

feat: Add icesat2waves CLI entry point

c9d1934

refactor: icesatwaves cli

4fbd76f

fix(ci): enable LFS in GitHub Actions workflow, rename it

406b569

fix: update test configuration

0eaaf27

Merge branch 'main' into feat-build-test-suite

185acae

fix: update pytest command in run_tests.yml

6272de1

Update pytest command in run_tests.yml

8371980