Optuna CI Testing #912

nv-braf · 2024-07-13T13:22:57Z

Adds four tests to protect Optuna:

L0_optuna_search (single model)
L0_optuna_bls
L0_optuna_ensemble
L0_optuna_multi_model

To make the review easier:

These are all based on their quick search counterparts, with the only differences being:
- Optuna is called (instead of quick) as the --run-config-search-mode in test.sh
- Expected number of measurements changed, based on min/max % of search space, in check_results.py

The CI with passing tests can be found at: https://gitlab-master.nvidia.com/dl/dgx/tritonmodelanalyzer/-/pipelines/16546091

model_analyzer/config/input/config_command_profile.py

dyastremsky

Splendid work here! Generally very clean code. Appreciated the clean-up and loved a lot of choices like using safe loading for the config files.

My comments apply across files, so if you address my comments in one place, please check across files to make sure it's fixed everywhere. I think the biggest thing though is that a lot of the files seem like duplicates. If there are slight differences, e.g. in the check_results files' number of trials, those can still be put into a shared file/folder with slightly different parameters that they receive. That'd make updating, maintaining, and documenting them easier.

I think even the test.sh files might be able to combined into one large test.sh file or have a base script which you import or run, just changing the MA command or variables a bit. With those changes, the check_results.py and test_config_generator.py files would be mostly eliminated except for one copy and the test.sh files would be pretty minimal. That'd make reading and updating them easier.

qa/L0_optuna_bls_model/check_results.py

qa/L0_optuna_bls_model/test.sh

qa/L0_optuna_ensemble_model/check_results.py

qa/L0_optuna_ensemble_model/test.sh

qa/L0_optuna_search/test.sh

This reverts commit 299b24c.

dyastremsky

Brian confirmed that this is the current approach for MA testing due to the lengthy CI time. If MA has a faster CI in the future, we can optimize out the shared pieces while still allowing these tests to run in parallel in the CI.

nv-braf added 7 commits July 11, 2024 15:00

Optuna L0 tests for single and ensemble models

65920c7

Adding BLS test

b0d3956

Adding L0 multi-model optuna test

a313d9b

Fixing multi-model test

98bc53e

Fixing CLI bugs

7b6080e

Attempt at fixing BLS test

eb58554

Adding debug prints to all optuna tests

1cc963d

nv-braf requested a review from dyastremsky July 13, 2024 13:22

nv-braf commented Jul 13, 2024

View reviewed changes

model_analyzer/config/input/config_command_profile.py Show resolved Hide resolved

dyastremsky reviewed Jul 15, 2024

View reviewed changes

nv-braf added 3 commits July 15, 2024 17:26

Fixing copyright dates

e8a7446

Fixing copyright dates

299b24c

Revert "Fixing copyright dates"

3d1dc75

This reverts commit 299b24c.

dyastremsky approved these changes Jul 15, 2024

View reviewed changes

nv-braf merged commit 020ecb6 into main Jul 17, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optuna CI Testing #912

Optuna CI Testing #912

nv-braf commented Jul 13, 2024

dyastremsky left a comment •

edited

Loading

dyastremsky left a comment

Optuna CI Testing #912

Optuna CI Testing #912

Conversation

nv-braf commented Jul 13, 2024

dyastremsky left a comment • edited Loading

Choose a reason for hiding this comment

dyastremsky left a comment

Choose a reason for hiding this comment

dyastremsky left a comment •

edited

Loading