LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] #466

jbkyang-nvi · 2024-02-02T22:37:52Z

No description provided.

jbkyang-nvi · 2024-02-02T22:38:20Z

to be merged with #463

rmccorm4 · 2024-02-02T22:41:56Z

src/c++/perf_analyzer/perf_analyzer.cc

+    const std::shared_ptr<pa::ModelParser>& parser,
+    const pa::PAParamsPTR& params)
+{
+  bool is_llm_from_user = params->is_llm if (is_llm_from_user)


does this compile?

rmccorm4 · 2024-02-02T22:46:41Z

src/c++/perf_analyzer/perf_analyzer.cc

+  // check if its decoupled
+  is_llm =
+      is_llm || (parser->IsDecoupled() && !params->profile_export_file.empty());


Will the greater changes here have any negative implications on the outputs for non-decoupled LLM models? ("offline" scenario)

I think this will require testing of some sort to ensure appropriate behavior.

rmccorm4 · 2024-02-02T22:47:46Z

src/c++/perf_analyzer/perf_analyzer.cc

+
+  // check if is ensemble model, and if model has a tensorrt_llm portion to it
+  // then it is for sure the tensorrt-llm backend
+  if (!parser->composing_models_map_.empty()) {


Does composing_models_map_ get populated for BLS models too if they provide the --bls-composing-models?

nnshah1 · 2024-02-02T23:14:25Z

src/c++/perf_analyzer/perf_analyzer.cc

@@ -428,6 +466,10 @@ PerfAnalyzer::WriteReport()
  bool should_output_metrics{
      params_->should_collect_metrics && params_->verbose_csv};

+  // TODO (TMA-1526): Detect if the model is LLM and report LLM metrics based


Do we need to know if it is llm? Can't we report metrics based on if we get multiple responses for a request ? (token to token, first response latency, etc.) - seem generic?

Tokens are a concept specific to LLMs no?
Users benchmarking decoupled models that do not use tokenizers would likely be confused if those metrics were reported.

I thought there were at least 3 cases where we would want this information:

What metrics to output

What stimulus to send in by default (better LLM default stimulus)

What metrics to stabilize on

I wouldn't say it is required, but my hope was to have it implemented and abstracted away so we didn't have multiple places of code trying to figure out their own way to make this distinction.

debermudez · 2024-02-02T23:09:56Z

src/c++/perf_analyzer/model_parser.cc

@@ -169,6 +169,16 @@ ModelParser::InitTriton(
    response_cache_enabled_ = cache_itr->value["enable"].GetBool();
  }

+  // Check what the backend is:
+  const auto backend_config_itr = config.FindMember("backend");


how does this handle ensemble models?
I have often seen pre and post processor steps show python as a backend but the main model had a backend that supported llms

Do we have testing around this?

src/c++/perf_analyzer/perf_analyzer.cc

tgerdesnv · 2024-02-05T22:23:33Z

Please change target branch to pa-llm-metrics, and add a PR description

matthewkotila · 2024-02-06T17:06:36Z

Should we merge this?

/s

debermudez · 2024-02-23T23:57:49Z

the ticket that spawned this PR was made will not do
@jbkyang-nvi i think we can safely close this PR

jbkyang-nvi · 2024-02-27T20:27:36Z

the ticket that spawned this PR was made will not do @jbkyang-nvi i think we can safely close this PR

Thanks for noticing. Closed

jbkyang-nvi requested review from matthewkotila, tgerdesnv and nv-hwoo February 2, 2024 22:37

rmccorm4 reviewed Feb 2, 2024

View reviewed changes

nnshah1 reviewed Feb 2, 2024

View reviewed changes

debermudez reviewed Feb 2, 2024

View reviewed changes

jbkyang-nvi changed the title ~~LLM Detection for Perf Analyzer in CAPI and Triton Backend~~ LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] Feb 5, 2024

matthewkotila marked this pull request as draft February 6, 2024 17:07

temp commit for feel for llm detection

3401b87

jbkyang-nvi force-pushed the kyang-detect-if-llm branch from c15288a to 3401b87 Compare February 16, 2024 20:59

jbkyang-nvi changed the base branch from main to pa-llm-metrics February 16, 2024 21:00

jbkyang-nvi closed this Feb 27, 2024

jbkyang-nvi deleted the kyang-detect-if-llm branch May 30, 2024 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] #466

LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] #466

jbkyang-nvi commented Feb 2, 2024

jbkyang-nvi commented Feb 2, 2024

rmccorm4 Feb 2, 2024

rmccorm4 Feb 2, 2024

debermudez Feb 2, 2024

rmccorm4 Feb 2, 2024

nnshah1 Feb 2, 2024

debermudez Feb 2, 2024

tgerdesnv Feb 5, 2024

debermudez Feb 2, 2024

debermudez Feb 2, 2024

tgerdesnv commented Feb 5, 2024

matthewkotila commented Feb 6, 2024

debermudez commented Feb 23, 2024

jbkyang-nvi commented Feb 27, 2024

LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] #466

LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] #466

Conversation

jbkyang-nvi commented Feb 2, 2024

jbkyang-nvi commented Feb 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgerdesnv commented Feb 5, 2024

matthewkotila commented Feb 6, 2024

debermudez commented Feb 23, 2024

jbkyang-nvi commented Feb 27, 2024