Adding ensemble support to optuna #903

nv-braf · 2024-06-24T14:22:03Z

Adds support for ensemble models to Optuna search.

Here is a live run output:

15:42:37 [Model Analyzer] Starting Optuna mode search to find optimal configs
15:42:37 [Model Analyzer] 
[I 2024-06-24 15:42:37,606] A new study created in memory with name: ensemble_add_sub
15:42:37 [Model Analyzer] Measuring default configuration to establish a baseline measurement
15:42:37 [Model Analyzer] Creating model config: ensemble_add_sub_config_default
15:42:37 [Model Analyzer] 
15:42:37 [Model Analyzer] Creating model config: add_config_default
15:42:37 [Model Analyzer] 
15:42:37 [Model Analyzer] Creating model config: sub_config_default
15:42:37 [Model Analyzer] 
15:42:37 [Model Analyzer] DEBUG: Triton Server started.
15:42:41 [Model Analyzer] DEBUG: Model add_config_default loaded
15:42:43 [Model Analyzer] DEBUG: Model sub_config_default loaded
15:42:44 [Model Analyzer] DEBUG: Model ensemble_add_sub_config_default loaded
15:42:44 [Model Analyzer] Profiling ensemble_add_sub_config_default: concurrency=0
15:42:49 [Model Analyzer] 
15:42:49 [Model Analyzer] DEBUG: Number of configs in search space: 275
15:42:49 [Model Analyzer] DEBUG: Model - ensemble_add_sub:
15:42:49 [Model Analyzer] DEBUG:   concurrency: 1 to 1024 (11)
15:42:49 [Model Analyzer] DEBUG: Composing model - add:
15:42:49 [Model Analyzer] DEBUG:   instance_group: 1 to 5 (5)
15:42:49 [Model Analyzer] DEBUG: Composing model - sub:
15:42:49 [Model Analyzer] DEBUG:   instance_group: 1 to 5 (5)
15:42:49 [Model Analyzer] 
15:42:49 [Model Analyzer] DEBUG: Minimum number of trials: 13 (5% of search space)
15:42:49 [Model Analyzer] DEBUG: Maximum number of trials: 27 (10% of search space)
15:42:49 [Model Analyzer] DEBUG: Trial 1 of 27:
15:42:49 [Model Analyzer] Creating model config: add_config_0
15:42:49 [Model Analyzer]   Setting instance_group to [{'count': 5, 'kind': 'KIND_GPU'}]
15:42:49 [Model Analyzer] 
15:42:49 [Model Analyzer] Creating model config: sub_config_0
15:42:49 [Model Analyzer]   Setting instance_group to [{'count': 4, 'kind': 'KIND_GPU'}]
15:42:49 [Model Analyzer] 
15:42:49 [Model Analyzer] Creating ensemble model config: ensemble_add_sub_config_0
15:42:49 [Model Analyzer]   Setting max_batch_size to 1
15:43:00 [Model Analyzer] Profiling ensemble_add_sub_config_0: concurrency=512
15:43:07 [Model Analyzer] DEBUG: Objective score for ensemble_add_sub_config_0::add_config_0,sub_config_0: 153 --- Best: ensemble_add_sub_config_0::add_config_0,sub_config_0 (153)
...
15:48:16 [Model Analyzer] DEBUG: Trial 23 of 27:
15:48:16 [Model Analyzer] Found existing model config: add_config_4
15:48:16 [Model Analyzer]   Setting instance_group to [{'count': 4, 'kind': 'KIND_GPU'}]
15:48:16 [Model Analyzer] 
15:48:16 [Model Analyzer] Found existing model config: sub_config_1
15:48:16 [Model Analyzer]   Setting instance_group to [{'count': 5, 'kind': 'KIND_GPU'}]
15:48:16 [Model Analyzer] 
15:48:16 [Model Analyzer] Found existing ensemble model config: ensemble_add_sub_config_8
15:48:16 [Model Analyzer]   Setting max_batch_size to 1
15:48:16 [Model Analyzer] Existing measurement found for run config. Skipping profile
15:48:16 [Model Analyzer] No changes made to analyzer data, no checkpoint saved.
15:48:16 [Model Analyzer] DEBUG: Objective score for ensemble_add_sub_config_8::add_config_4,sub_config_1: 164 --- Best: ensemble_add_sub_config_8::add_config_4,sub_config_1 (164)
15:48:16 [Model Analyzer] 
15:48:16 [Model Analyzer] DEBUG: Early termination threshold reached

Note: No unit testing is in place to protect this - as it is very difficult to mock up everything needed to create an ensemble and re-create all the steps needed to call OptunaRCG. This is because the info about composing models for an ensemble are contained inside the config.yaml
I've filed a story to add an L0 test to protect this code.

tgerdesnv · 2024-06-24T17:13:32Z

"15:48:16 [Model Analyzer] Existing measurement found for run config. Skipping profile"
Does this factor in the concurrency? Or, if a measurement with concurrency = X is found, will it not try concurrency = Y?

model_analyzer/analyzer.py

nv-braf · 2024-06-24T17:15:20Z

"15:48:16 [Model Analyzer] Existing measurement found for run config. Skipping profile" Does this factor in the concurrency? Or, if a measurement with concurrency = X is found, will it not try concurrency = Y?

This factors in concurrency (RunConfig = PA config + Model Config).

model_analyzer/config/generate/optuna_run_config_generator.py

Adding ensemble support to optuna

c959139

nv-braf marked this pull request as ready for review June 24, 2024 15:56

nv-braf requested review from tgerdesnv and ganeshku1 June 24, 2024 15:57

tgerdesnv reviewed Jun 24, 2024

View reviewed changes

model_analyzer/analyzer.py Show resolved Hide resolved

tgerdesnv reviewed Jun 24, 2024

View reviewed changes

model_analyzer/config/generate/optuna_run_config_generator.py Show resolved Hide resolved

nv-braf requested a review from tgerdesnv June 24, 2024 23:23

tgerdesnv reviewed Jun 25, 2024

View reviewed changes

model_analyzer/config/generate/optuna_run_config_generator.py Show resolved Hide resolved

tgerdesnv reviewed Jun 25, 2024

View reviewed changes

model_analyzer/config/generate/optuna_run_config_generator.py Show resolved Hide resolved

tgerdesnv approved these changes Jun 25, 2024

View reviewed changes

nv-braf merged commit 87ec68b into main Jun 26, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ensemble support to optuna #903

Adding ensemble support to optuna #903

nv-braf commented Jun 24, 2024 •

edited

Loading

tgerdesnv commented Jun 24, 2024

nv-braf commented Jun 24, 2024

Adding ensemble support to optuna #903

Adding ensemble support to optuna #903

Conversation

nv-braf commented Jun 24, 2024 • edited Loading

tgerdesnv commented Jun 24, 2024

nv-braf commented Jun 24, 2024

nv-braf commented Jun 24, 2024 •

edited

Loading