Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite, CLI #74

Closed
wants to merge 34 commits into from
Closed

Test suite, CLI #74

wants to merge 34 commits into from

Conversation

cpaniaguam
Copy link
Member

Benefits:

  • More granular tests
  • Parallel execution

Copy link
Member

@hollandjg hollandjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're doing is parallelizing the tests by using subprocess dispatch from within python, and you're including results from one run into the code base as a reference. Then you're checking for the existence of the output files. Is that right?

I've a couple of questions here:

  • Might these different processes interfere with each other's inputs or outputs?
  • Are you sure that storing the results of a run in the repo will be a sustainable approach to testing? The risk here is that your test cases might become dependent on precisely that format, and precisely those results, so updating the test cases requires adding new large files, and adding new tests requires adding even more new large files.
  • Do the GitHub actions runners actually have multiple CPUs which you can use? If not, is this more for accelerating your local tests?

A suggestion to think about: Another way you could parallelize this whole thing is by setting up the GitHub actions jobs so that they can run in parallel, and leverage the OS (rather than python) to do this. There's an example here: https://medium.com/tradeling/how-to-achieve-parallel-execution-using-github-actions-d534404702fb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say something about this file? What's its job?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripts produce two directories of output files, plots and work. Currently, this is the expected structure for those directories after running the first 5 steps.

➜  icesat2-tracks git:(feat-build-test-suite) ✗ tree plots
plots
└── SH
    └── SH_testSLsinglefile2
        └── SH_20190502_05180312
            ├── A02_SH_2_hindcast_data.pdf
            ├── A02_SH_2_hindcast_prior.pdf
            ├── B01_track.png.png
            ├── B01b_ATL06_corrected.png.png
            ├── B01b_beam_statistics.png.png
            ├── B03_specs_L25000.0.png
            ├── B03_specs_coord_check.png
            ├── B03_spectra
            │   ├── B03_freq_reconst_x28.pdf
            │   ├── B03_freq_reconst_x47.pdf
            │   ├── B03_freq_reconst_x56.pdf
            │   ├── B03_freq_reconst_x66.pdf
            │   └── B03_freq_reconst_x75.pdf
            ├── B03_success.json
            ├── B04_data_avail.pdf
            ├── B04_marginal_distributions.pdf
            ├── B04_prior_angle.png
            └── B04_success.json
➜  icesat2-tracks git:(feat-build-test-suite) ✗ tree work              
work
├── SH_testSLsinglefile2
│   ├── A01b_ID
│   │   └── A01b_ID_SH_20190502_05180312.json
│   ├── A02_prior
│   │   ├── A02_SH_20190502_05180312.h5
│   │   └── A02_SH_20190502_05180312_hindcast_success.json
│   ├── B01_regrid
│   │   └── SH_20190502_05180312_B01_binned.h5
│   ├── B02_spectra
│   │   ├── B02_SH_20190502_05180312_FFT.nc
│   │   ├── B02_SH_20190502_05180312_gFT_k.nc
│   │   ├── B02_SH_20190502_05180312_gFT_x.nc
│   │   └── B02_SH_20190502_05180312_params.h5
│   └── B04_angle
│       ├── B04_SH_20190502_05180312_marginals.nc
│       └── B04_SH_20190502_05180312_res_table.h5
└── processed

The spirit is to have the tests run more robustly, independently of each other, and free of external data sources. At the moment sliderule, used to get the input datasets, is down/not-working-properly/working-erratically (see this).

pyproject.toml Outdated
Comment on lines 118 to 119
"emcee==3.1.4",
"siphon"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say something about these new dependencies? I don't see them used in any of the python files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used in step 5.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some curious highlighting happening here. Looks like a syntax error, but I can't see where it's coming from. Any ideas?
Screenshot 2024-01-22 at 08 40 05

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure as my linter is not reporting an issue for this file. I think I am using ruff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say something about what this is doing? I think a few comments might be useful here!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is running the steps (1-5 so far) of the processing pipeline. Might need to split this into more than one file after the process that fetches external data is isolated.

@cpaniaguam
Copy link
Member Author

I think what you're doing is parallelizing the tests by using subprocess dispatch from within python, and you're including results from one run into the code base as a reference. Then you're checking for the existence of the output files. Is that right?

Correct! I am not completely sure how pytest handles the parallelization, but that's probably the case.

  • Might these different processes interfere with each other's inputs or outputs?

I don't think so, especially if the tests are run in different processes. As things stand at the moment, this will require some refactoring of the scripts so that the source of the input data can be specified interactively.

  • Are you sure that storing the results of a run in the repo will be a sustainable approach to testing? The risk here is that your test cases might become dependent on precisely that format, and precisely those results, so updating the test cases requires adding new large files, and adding new tests requires adding even more new large files.

Yes, I think this is a viable approach. The format is dependent on the input data and the scripts used to generate the files. If the input data is fixed, the results should be the same as long as the scripts are not changed in ways that modify their output. If keeping relatively large data files in the repo is a concern, these could be moved to LFS. The actual problem we are having now, which is what motivated this PR, is the dependency on the external service to gather input data, which is not working at the moment.

  • Do the GitHub actions runners actually have multiple CPUs which you can use? If not, is this more for accelerating your local tests?

Yes, they do (see here). Linux runners in particular have 4 cpus.

A suggestion to think about: Another way you could parallelize this whole thing is by setting up the GitHub actions jobs so that they can run in parallel, and leverage the OS (rather than python) to do this. There's an example here: https://medium.com/tradeling/how-to-achieve-parallel-execution-using-github-actions-d534404702fb

Thanks for the suggestion! I'll look into it 👍

@cpaniaguam cpaniaguam changed the base branch from main to feat-step5-intg January 22, 2024 20:42
@cpaniaguam cpaniaguam changed the base branch from feat-step5-intg to main January 22, 2024 21:45
.gitattributes Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles the tar.gz files with lfs.

- name: install pytest
run: pip install pytest pytest-xdist
- name: Run tests
run: pytest --capture=sys --verbose --showlocals --tb=long --durations=5 --numprocesses=4 tests/test_steps.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those new to the pytest framework, here is a breakdown of the command that runs the tests:

  • pytest: This is the command to run the pytest testing framework.

  • --capture=sys: This option tells pytest to capture stdout and stderr at the sys level. If your tests print to stdout or stderr, those prints will be captured and only displayed in the event of a test failure.

  • --verbose or -v: This option enables verbose output, which means pytest will print more information about each test that is run.

  • --showlocals or -l: This option shows local variables in tracebacks.

  • --tb=long: This option controls the format of the traceback that pytest prints when a test fails. long is the most detailed format.

  • --durations=5: This option tells pytest to record the duration of each test, and at the end of the test session, list the 5 slowest tests.

  • --numprocesses=4 or -n 4: This option, provided by the pytest-xdist plugin, tells pytest to run tests in parallel with 4 workers.

  • tests/test_steps.py: This is the path to the test file that pytest should run.

- name: install pytest
run: pip install pytest pytest-xdist pytest-sugar pytest-durations
- name: Run tests
run: pytest --capture=sys --verbose --showlocals --tb=long --numprocesses=auto tests/test_steps.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is used to run pytest, a testing framework for Python. Here's a breakdown of each part:

  • pytest: This is the command to run the pytest testing framework.

  • --capture=sys: This option tells pytest to capture stdout and stderr at the sys level. If your tests print to stdout or stderr, those prints will be captured and only displayed in the event of a test failure.

  • --verbose or -v: This option enables verbose output, which means pytest will print more information about each test that is run.

  • --showlocals or -l: This option shows local variables in tracebacks.

  • --tb=long: This option controls the format of the traceback that pytest prints when a test fails. long is the most detailed format.

  • --numprocesses=auto or -n auto: This option, provided by the pytest-xdist plugin, tells pytest to run tests in parallel with a number of workers equal to the number of available CPUs.

  • tests/test_steps.py: This is the path to the test file that pytest should run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow renamed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to use this hook in your local development environment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanups and explicit imports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI added

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI added

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI added

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI tool for the whole package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed by lfs (deprecated)

@cpaniaguam cpaniaguam requested a review from hollandjg January 29, 2024 19:10
@cpaniaguam cpaniaguam changed the title Add test workflow and update dependencies Test suite, CLI Jan 30, 2024
@cpaniaguam cpaniaguam self-assigned this Jan 30, 2024
Copy link
Member

@hollandjg hollandjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Carlos! Thanks for this PR!

Here my feedback in text form – we just discussed everything here.

It's pretty big, and it looks like you're changing a few things together which don't necessarily need to be in one PR (like the code cleanups in generalized_FT and io, adding the CLI to the analysis_db files and adding the test cases).

Please could you split this into some smaller PRs? We discussed:

  • one for each file where you've modified the CLI – just refactoring the existing calls in the existing test suite to keep everything working
  • one for each set of cleanups which are independent of the CLI
  • one for the new test suite as a whole

That'll mean that we can review each piece independently, which will help reduce the cognitive load so we don't miss anything and speed up the review process too.

Now you know how they all fit together and that they do work this way, it shouldn't be too tricky to move the individual moving parts to new branches.

If you'd like to chat through splitting up the PR, just let me know – I've had to do it a few times in the last year on AutoRA and Project Portal.

@kmilo9999
Copy link
Collaborator

This looks awesome @cpaniaguam Great Job.

@cpaniaguam cpaniaguam marked this pull request as draft February 2, 2024 17:19
@cpaniaguam cpaniaguam closed this Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants