Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] #466

Closed
wants to merge 1 commit into from

Conversation

jbkyang-nvi
Copy link
Contributor

No description provided.

@jbkyang-nvi
Copy link
Contributor Author

to be merged with #463

const std::shared_ptr<pa::ModelParser>& parser,
const pa::PAParamsPTR& params)
{
bool is_llm_from_user = params->is_llm if (is_llm_from_user)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this compile?

Comment on lines +67 to +69
// check if its decoupled
is_llm =
is_llm || (parser->IsDecoupled() && !params->profile_export_file.empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the greater changes here have any negative implications on the outputs for non-decoupled LLM models? ("offline" scenario)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will require testing of some sort to ensure appropriate behavior.


// check if is ensemble model, and if model has a tensorrt_llm portion to it
// then it is for sure the tensorrt-llm backend
if (!parser->composing_models_map_.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does composing_models_map_ get populated for BLS models too if they provide the --bls-composing-models?

@@ -428,6 +466,10 @@ PerfAnalyzer::WriteReport()
bool should_output_metrics{
params_->should_collect_metrics && params_->verbose_csv};

// TODO (TMA-1526): Detect if the model is LLM and report LLM metrics based
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to know if it is llm? Can't we report metrics based on if we get multiple responses for a request ? (token to token, first response latency, etc.) - seem generic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tokens are a concept specific to LLMs no?
Users benchmarking decoupled models that do not use tokenizers would likely be confused if those metrics were reported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there were at least 3 cases where we would want this information:

  • What metrics to output
  • What stimulus to send in by default (better LLM default stimulus)
  • What metrics to stabilize on

I wouldn't say it is required, but my hope was to have it implemented and abstracted away so we didn't have multiple places of code trying to figure out their own way to make this distinction.

@@ -169,6 +169,16 @@ ModelParser::InitTriton(
response_cache_enabled_ = cache_itr->value["enable"].GetBool();
}

// Check what the backend is:
const auto backend_config_itr = config.FindMember("backend");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this handle ensemble models?
I have often seen pre and post processor steps show python as a backend but the main model had a backend that supported llms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have testing around this?

src/c++/perf_analyzer/perf_analyzer.cc Show resolved Hide resolved
src/c++/perf_analyzer/perf_analyzer.cc Show resolved Hide resolved
@jbkyang-nvi jbkyang-nvi changed the title LLM Detection for Perf Analyzer in CAPI and Triton Backend LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] Feb 5, 2024
@tgerdesnv
Copy link
Collaborator

Please change target branch to pa-llm-metrics, and add a PR description

@matthewkotila
Copy link
Contributor

Should we merge this?

/s

@matthewkotila matthewkotila marked this pull request as draft February 6, 2024 17:07
@jbkyang-nvi jbkyang-nvi changed the base branch from main to pa-llm-metrics February 16, 2024 21:00
@debermudez
Copy link
Contributor

the ticket that spawned this PR was made will not do
@jbkyang-nvi i think we can safely close this PR

@jbkyang-nvi
Copy link
Contributor Author

the ticket that spawned this PR was made will not do @jbkyang-nvi i think we can safely close this PR

Thanks for noticing. Closed

@jbkyang-nvi jbkyang-nvi deleted the kyang-detect-if-llm branch May 30, 2024 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants