-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite, CLI #74
Test suite, CLI #74
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're doing is parallelizing the tests by using subprocess dispatch from within python, and you're including results from one run into the code base as a reference. Then you're checking for the existence of the output files. Is that right?
I've a couple of questions here:
- Might these different processes interfere with each other's inputs or outputs?
- Are you sure that storing the results of a run in the repo will be a sustainable approach to testing? The risk here is that your test cases might become dependent on precisely that format, and precisely those results, so updating the test cases requires adding new large files, and adding new tests requires adding even more new large files.
- Do the GitHub actions runners actually have multiple CPUs which you can use? If not, is this more for accelerating your local tests?
A suggestion to think about: Another way you could parallelize this whole thing is by setting up the GitHub actions jobs so that they can run in parallel, and leverage the OS (rather than python) to do this. There's an example here: https://medium.com/tradeling/how-to-achieve-parallel-execution-using-github-actions-d534404702fb
tests/plots.tar.gz
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say something about this file? What's its job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scripts produce two directories of output files, plots
and work
. Currently, this is the expected structure for those directories after running the first 5 steps.
➜ icesat2-tracks git:(feat-build-test-suite) ✗ tree plots
plots
└── SH
└── SH_testSLsinglefile2
└── SH_20190502_05180312
├── A02_SH_2_hindcast_data.pdf
├── A02_SH_2_hindcast_prior.pdf
├── B01_track.png.png
├── B01b_ATL06_corrected.png.png
├── B01b_beam_statistics.png.png
├── B03_specs_L25000.0.png
├── B03_specs_coord_check.png
├── B03_spectra
│ ├── B03_freq_reconst_x28.pdf
│ ├── B03_freq_reconst_x47.pdf
│ ├── B03_freq_reconst_x56.pdf
│ ├── B03_freq_reconst_x66.pdf
│ └── B03_freq_reconst_x75.pdf
├── B03_success.json
├── B04_data_avail.pdf
├── B04_marginal_distributions.pdf
├── B04_prior_angle.png
└── B04_success.json
➜ icesat2-tracks git:(feat-build-test-suite) ✗ tree work
work
├── SH_testSLsinglefile2
│ ├── A01b_ID
│ │ └── A01b_ID_SH_20190502_05180312.json
│ ├── A02_prior
│ │ ├── A02_SH_20190502_05180312.h5
│ │ └── A02_SH_20190502_05180312_hindcast_success.json
│ ├── B01_regrid
│ │ └── SH_20190502_05180312_B01_binned.h5
│ ├── B02_spectra
│ │ ├── B02_SH_20190502_05180312_FFT.nc
│ │ ├── B02_SH_20190502_05180312_gFT_k.nc
│ │ ├── B02_SH_20190502_05180312_gFT_x.nc
│ │ └── B02_SH_20190502_05180312_params.h5
│ └── B04_angle
│ ├── B04_SH_20190502_05180312_marginals.nc
│ └── B04_SH_20190502_05180312_res_table.h5
└── processed
The spirit is to have the tests run more robustly, independently of each other, and free of external data sources. At the moment sliderule
, used to get the input datasets, is down/not-working-properly/working-erratically (see this).
pyproject.toml
Outdated
"emcee==3.1.4", | ||
"siphon" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say something about these new dependencies? I don't see them used in any of the python files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are used in step 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure as my linter is not reporting an issue for this file. I think I am using ruff.
tests/test_steps.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say something about what this is doing? I think a few comments might be useful here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is running the steps (1-5 so far) of the processing pipeline. Might need to split this into more than one file after the process that fetches external data is isolated.
Correct! I am not completely sure how
I don't think so, especially if the tests are run in different processes. As things stand at the moment, this will require some refactoring of the scripts so that the source of the input data can be specified interactively.
Yes, I think this is a viable approach. The format is dependent on the input data and the scripts used to generate the files. If the input data is fixed, the results should be the same as long as the scripts are not changed in ways that modify their output. If keeping relatively large data files in the repo is a concern, these could be moved to LFS. The actual problem we are having now, which is what motivated this PR, is the dependency on the external service to gather input data, which is not working at the moment.
Yes, they do (see here). Linux runners in particular have 4 cpus.
Thanks for the suggestion! I'll look into it 👍 |
Files changed for steps 1-4
This commit adds the .gitattributes file with a filter for *.tar.gz files and updates the .gitignore file to include /logs, localci-runner.sh, and tests/test_tarballs.py.
TODO: make it work in parallel
input parameter reporting, update mconfig paths with outputdir
changed B01_SL_load_single_file.py
fix other linting errors
.gitattributes
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handles the tar.gz files with lfs.
.github/workflows/run_tests.yml
Outdated
- name: install pytest | ||
run: pip install pytest pytest-xdist | ||
- name: Run tests | ||
run: pytest --capture=sys --verbose --showlocals --tb=long --durations=5 --numprocesses=4 tests/test_steps.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those new to the pytest framework, here is a breakdown of the command that runs the tests:
-
pytest
: This is the command to run the pytest testing framework. -
--capture=sys
: This option tells pytest to capture stdout and stderr at the sys level. If your tests print to stdout or stderr, those prints will be captured and only displayed in the event of a test failure. -
--verbose
or-v
: This option enables verbose output, which means pytest will print more information about each test that is run. -
--showlocals
or-l
: This option shows local variables in tracebacks. -
--tb=long
: This option controls the format of the traceback that pytest prints when a test fails.long
is the most detailed format. -
--durations=5
: This option tells pytest to record the duration of each test, and at the end of the test session, list the 5 slowest tests. -
--numprocesses=4
or-n 4
: This option, provided by the pytest-xdist plugin, tells pytest to run tests in parallel with 4 workers. -
tests/test_steps.py
: This is the path to the test file that pytest should run.
- name: install pytest | ||
run: pip install pytest pytest-xdist pytest-sugar pytest-durations | ||
- name: Run tests | ||
run: pytest --capture=sys --verbose --showlocals --tb=long --numprocesses=auto tests/test_steps.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command is used to run pytest, a testing framework for Python. Here's a breakdown of each part:
-
pytest
: This is the command to run the pytest testing framework. -
--capture=sys
: This option tells pytest to capture stdout and stderr at the sys level. If your tests print to stdout or stderr, those prints will be captured and only displayed in the event of a test failure. -
--verbose
or-v
: This option enables verbose output, which means pytest will print more information about each test that is run. -
--showlocals
or-l
: This option shows local variables in tracebacks. -
--tb=long
: This option controls the format of the traceback that pytest prints when a test fails.long
is the most detailed format. -
--numprocesses=auto
or-n auto
: This option, provided by the pytest-xdist plugin, tells pytest to run tests in parallel with a number of workers equal to the number of available CPUs. -
tests/test_steps.py
: This is the path to the test file that pytest should run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflow renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to use this hook in your local development environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanups and explicit imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI tool for the whole package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managed by lfs (deprecated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Carlos! Thanks for this PR!
Here my feedback in text form – we just discussed everything here.
It's pretty big, and it looks like you're changing a few things together which don't necessarily need to be in one PR (like the code cleanups in generalized_FT and io, adding the CLI to the analysis_db files and adding the test cases).
Please could you split this into some smaller PRs? We discussed:
- one for each file where you've modified the CLI – just refactoring the existing calls in the existing test suite to keep everything working
- one for each set of cleanups which are independent of the CLI
- one for the new test suite as a whole
That'll mean that we can review each piece independently, which will help reduce the cognitive load so we don't miss anything and speed up the review process too.
Now you know how they all fit together and that they do work this way, it shouldn't be too tricky to move the individual moving parts to new branches.
If you'd like to chat through splitting up the PR, just let me know – I've had to do it a few times in the last year on AutoRA and Project Portal.
This looks awesome @cpaniaguam Great Job. |
Benefits: