-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM Detection for Perf Analyzer in CAPI and Triton Backend [ DO NOT MERGE, just ideas ] #466
Conversation
to be merged with #463 |
const std::shared_ptr<pa::ModelParser>& parser, | ||
const pa::PAParamsPTR& params) | ||
{ | ||
bool is_llm_from_user = params->is_llm if (is_llm_from_user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this compile?
// check if its decoupled | ||
is_llm = | ||
is_llm || (parser->IsDecoupled() && !params->profile_export_file.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the greater changes here have any negative implications on the outputs for non-decoupled LLM models? ("offline" scenario)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will require testing of some sort to ensure appropriate behavior.
|
||
// check if is ensemble model, and if model has a tensorrt_llm portion to it | ||
// then it is for sure the tensorrt-llm backend | ||
if (!parser->composing_models_map_.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does composing_models_map_
get populated for BLS models too if they provide the --bls-composing-models
?
@@ -428,6 +466,10 @@ PerfAnalyzer::WriteReport() | |||
bool should_output_metrics{ | |||
params_->should_collect_metrics && params_->verbose_csv}; | |||
|
|||
// TODO (TMA-1526): Detect if the model is LLM and report LLM metrics based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to know if it is llm? Can't we report metrics based on if we get multiple responses for a request ? (token to token, first response latency, etc.) - seem generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tokens are a concept specific to LLMs no?
Users benchmarking decoupled models that do not use tokenizers would likely be confused if those metrics were reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there were at least 3 cases where we would want this information:
- What metrics to output
- What stimulus to send in by default (better LLM default stimulus)
- What metrics to stabilize on
I wouldn't say it is required, but my hope was to have it implemented and abstracted away so we didn't have multiple places of code trying to figure out their own way to make this distinction.
@@ -169,6 +169,16 @@ ModelParser::InitTriton( | |||
response_cache_enabled_ = cache_itr->value["enable"].GetBool(); | |||
} | |||
|
|||
// Check what the backend is: | |||
const auto backend_config_itr = config.FindMember("backend"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this handle ensemble models?
I have often seen pre and post processor steps show python as a backend but the main model had a backend that supported llms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have testing around this?
Please change target branch to pa-llm-metrics, and add a PR description |
Should we merge this? /s |
c15288a
to
3401b87
Compare
the ticket that spawned this PR was made will not do |
Thanks for noticing. Closed |
No description provided.