Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optuna CI Testing #912

Merged
merged 10 commits into from
Jul 17, 2024
Merged

Optuna CI Testing #912

merged 10 commits into from
Jul 17, 2024

Conversation

nv-braf
Copy link
Contributor

@nv-braf nv-braf commented Jul 13, 2024

Adds four tests to protect Optuna:

  • L0_optuna_search (single model)
  • L0_optuna_bls
  • L0_optuna_ensemble
  • L0_optuna_multi_model

To make the review easier:

  • These are all based on their quick search counterparts, with the only differences being:
    • Optuna is called (instead of quick) as the --run-config-search-mode in test.sh
    • Expected number of measurements changed, based on min/max % of search space, in check_results.py

The CI with passing tests can be found at: https://gitlab-master.nvidia.com/dl/dgx/tritonmodelanalyzer/-/pipelines/16546091

@nv-braf nv-braf requested a review from dyastremsky July 13, 2024 13:22
Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splendid work here! Generally very clean code. Appreciated the clean-up and loved a lot of choices like using safe loading for the config files.

My comments apply across files, so if you address my comments in one place, please check across files to make sure it's fixed everywhere. I think the biggest thing though is that a lot of the files seem like duplicates. If there are slight differences, e.g. in the check_results files' number of trials, those can still be put into a shared file/folder with slightly different parameters that they receive. That'd make updating, maintaining, and documenting them easier.

I think even the test.sh files might be able to combined into one large test.sh file or have a base script which you import or run, just changing the MA command or variables a bit. With those changes, the check_results.py and test_config_generator.py files would be mostly eliminated except for one copy and the test.sh files would be pretty minimal. That'd make reading and updating them easier.

qa/L0_optuna_bls_model/check_results.py Outdated Show resolved Hide resolved
qa/L0_optuna_bls_model/check_results.py Show resolved Hide resolved
qa/L0_optuna_bls_model/check_results.py Show resolved Hide resolved
qa/L0_optuna_bls_model/check_results.py Show resolved Hide resolved
qa/L0_optuna_bls_model/test.sh Show resolved Hide resolved
qa/L0_optuna_ensemble_model/check_results.py Show resolved Hide resolved
qa/L0_optuna_ensemble_model/check_results.py Show resolved Hide resolved
qa/L0_optuna_ensemble_model/test.sh Show resolved Hide resolved
qa/L0_optuna_search/test.sh Show resolved Hide resolved
Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brian confirmed that this is the current approach for MA testing due to the lengthy CI time. If MA has a faster CI in the future, we can optimize out the shared pieces while still allowing these tests to run in parallel in the CI.

@nv-braf nv-braf merged commit 020ecb6 into main Jul 17, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants