-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] #466
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,44 @@ SignalHandler(int signum) | |
exit(0); | ||
} | ||
} | ||
|
||
bool | ||
IsLLMModel( | ||
const std::shared_ptr<pa::ModelParser>& parser, | ||
const pa::PAParamsPTR& params) | ||
{ | ||
bool is_llm_from_user = params->is_llm if (is_llm_from_user) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this compile?
debermudez marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
return true; | ||
} | ||
|
||
bool is_llm = false; | ||
// check if its decoupled | ||
is_llm = | ||
is_llm || (parser->IsDecoupled() && !params->profile_export_file.empty()); | ||
Comment on lines
+67
to
+69
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will the greater changes here have any negative implications on the outputs for non-decoupled LLM models? ("offline" scenario) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this will require testing of some sort to ensure appropriate behavior. |
||
|
||
// check if is ensemble model, and if model has a tensorrt_llm portion to it | ||
// then it is for sure the tensorrt-llm backend | ||
if (!parser->composing_models_map_.empty()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does |
||
auto composing_models_map = parser->composing_models_map_; | ||
for (auto& [_, model_version_pair] : *composing_models_map) { | ||
debermudez marked this conversation as resolved.
Show resolved
Hide resolved
|
||
std::string model_version = model_version_pair.first; | ||
if (model_version == "tensorrt_llm") { | ||
parser->backend_type == ModelParser::TritonBackendType::TENSORRT_LLM; | ||
break; | ||
} | ||
} | ||
} | ||
|
||
// check if backend used is vLLM or TensorRT-LLM backend | ||
is_llm = is_llm || | ||
(parser->backend_type_ == ModelParser::TritonBackendType::VLLM || | ||
parser->backend_type_ = | ||
ModelParser::TritonBackendType::TENSORRT_LLM); | ||
|
||
return is_llm; | ||
} | ||
|
||
}} // namespace triton::perfanalyzer | ||
|
||
PerfAnalyzer::PerfAnalyzer(pa::PAParamsPtr params) : params_(params) | ||
|
@@ -428,6 +466,10 @@ PerfAnalyzer::WriteReport() | |
bool should_output_metrics{ | ||
params_->should_collect_metrics && params_->verbose_csv}; | ||
|
||
// TODO (TMA-1526): Detect if the model is LLM and report LLM metrics based | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to know if it is llm? Can't we report metrics based on if we get multiple responses for a request ? (token to token, first response latency, etc.) - seem generic? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tokens are a concept specific to LLMs no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought there were at least 3 cases where we would want this information:
I wouldn't say it is required, but my hope was to have it implemented and abstracted away so we didn't have multiple places of code trying to figure out their own way to make this distinction. |
||
// on that signal. Currently we simply check if it's a decoupled model. | ||
bool should_output_llm_metrics{IsLLMModel(parser_, params_)}; | ||
|
||
std::unique_ptr<pa::ReportWriter> writer; | ||
|
||
FAIL_IF_ERR( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this handle ensemble models?
I have often seen pre and post processor steps show python as a backend but the main model had a backend that supported llms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have testing around this?