Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ensemble support to optuna #903

Merged
merged 1 commit into from
Jun 26, 2024
Merged

Adding ensemble support to optuna #903

merged 1 commit into from
Jun 26, 2024

Conversation

nv-braf
Copy link
Contributor

@nv-braf nv-braf commented Jun 24, 2024

Adds support for ensemble models to Optuna search.

Here is a live run output:

15:42:37 [Model Analyzer] Starting Optuna mode search to find optimal configs
15:42:37 [Model Analyzer] 
[I 2024-06-24 15:42:37,606] A new study created in memory with name: ensemble_add_sub
15:42:37 [Model Analyzer] Measuring default configuration to establish a baseline measurement
15:42:37 [Model Analyzer] Creating model config: ensemble_add_sub_config_default
15:42:37 [Model Analyzer] 
15:42:37 [Model Analyzer] Creating model config: add_config_default
15:42:37 [Model Analyzer] 
15:42:37 [Model Analyzer] Creating model config: sub_config_default
15:42:37 [Model Analyzer] 
15:42:37 [Model Analyzer] DEBUG: Triton Server started.
15:42:41 [Model Analyzer] DEBUG: Model add_config_default loaded
15:42:43 [Model Analyzer] DEBUG: Model sub_config_default loaded
15:42:44 [Model Analyzer] DEBUG: Model ensemble_add_sub_config_default loaded
15:42:44 [Model Analyzer] Profiling ensemble_add_sub_config_default: concurrency=0
15:42:49 [Model Analyzer] 
15:42:49 [Model Analyzer] DEBUG: Number of configs in search space: 275
15:42:49 [Model Analyzer] DEBUG: Model - ensemble_add_sub:
15:42:49 [Model Analyzer] DEBUG:   concurrency: 1 to 1024 (11)
15:42:49 [Model Analyzer] DEBUG: Composing model - add:
15:42:49 [Model Analyzer] DEBUG:   instance_group: 1 to 5 (5)
15:42:49 [Model Analyzer] DEBUG: Composing model - sub:
15:42:49 [Model Analyzer] DEBUG:   instance_group: 1 to 5 (5)
15:42:49 [Model Analyzer] 
15:42:49 [Model Analyzer] DEBUG: Minimum number of trials: 13 (5% of search space)
15:42:49 [Model Analyzer] DEBUG: Maximum number of trials: 27 (10% of search space)
15:42:49 [Model Analyzer] DEBUG: Trial 1 of 27:
15:42:49 [Model Analyzer] Creating model config: add_config_0
15:42:49 [Model Analyzer]   Setting instance_group to [{'count': 5, 'kind': 'KIND_GPU'}]
15:42:49 [Model Analyzer] 
15:42:49 [Model Analyzer] Creating model config: sub_config_0
15:42:49 [Model Analyzer]   Setting instance_group to [{'count': 4, 'kind': 'KIND_GPU'}]
15:42:49 [Model Analyzer] 
15:42:49 [Model Analyzer] Creating ensemble model config: ensemble_add_sub_config_0
15:42:49 [Model Analyzer]   Setting max_batch_size to 1
15:43:00 [Model Analyzer] Profiling ensemble_add_sub_config_0: concurrency=512
15:43:07 [Model Analyzer] DEBUG: Objective score for ensemble_add_sub_config_0::add_config_0,sub_config_0: 153 --- Best: ensemble_add_sub_config_0::add_config_0,sub_config_0 (153)
...
15:48:16 [Model Analyzer] DEBUG: Trial 23 of 27:
15:48:16 [Model Analyzer] Found existing model config: add_config_4
15:48:16 [Model Analyzer]   Setting instance_group to [{'count': 4, 'kind': 'KIND_GPU'}]
15:48:16 [Model Analyzer] 
15:48:16 [Model Analyzer] Found existing model config: sub_config_1
15:48:16 [Model Analyzer]   Setting instance_group to [{'count': 5, 'kind': 'KIND_GPU'}]
15:48:16 [Model Analyzer] 
15:48:16 [Model Analyzer] Found existing ensemble model config: ensemble_add_sub_config_8
15:48:16 [Model Analyzer]   Setting max_batch_size to 1
15:48:16 [Model Analyzer] Existing measurement found for run config. Skipping profile
15:48:16 [Model Analyzer] No changes made to analyzer data, no checkpoint saved.
15:48:16 [Model Analyzer] DEBUG: Objective score for ensemble_add_sub_config_8::add_config_4,sub_config_1: 164 --- Best: ensemble_add_sub_config_8::add_config_4,sub_config_1 (164)
15:48:16 [Model Analyzer] 
15:48:16 [Model Analyzer] DEBUG: Early termination threshold reached

image

Note: No unit testing is in place to protect this - as it is very difficult to mock up everything needed to create an ensemble and re-create all the steps needed to call OptunaRCG. This is because the info about composing models for an ensemble are contained inside the config.yaml
I've filed a story to add an L0 test to protect this code.

@nv-braf nv-braf marked this pull request as ready for review June 24, 2024 15:56
@nv-braf nv-braf requested review from tgerdesnv and ganeshku1 June 24, 2024 15:57
@tgerdesnv
Copy link
Collaborator

"15:48:16 [Model Analyzer] Existing measurement found for run config. Skipping profile"
Does this factor in the concurrency? Or, if a measurement with concurrency = X is found, will it not try concurrency = Y?

@nv-braf
Copy link
Contributor Author

nv-braf commented Jun 24, 2024

"15:48:16 [Model Analyzer] Existing measurement found for run config. Skipping profile" Does this factor in the concurrency? Or, if a measurement with concurrency = X is found, will it not try concurrency = Y?

This factors in concurrency (RunConfig = PA config + Model Config).

@nv-braf nv-braf requested a review from tgerdesnv June 24, 2024 23:23
@nv-braf nv-braf merged commit 87ec68b into main Jun 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants