Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling model using genai-perf #849

Merged
merged 5 commits into from
Mar 27, 2024

Conversation

nv-braf
Copy link
Contributor

@nv-braf nv-braf commented Mar 26, 2024

Successfully profiled and checkpointed a a vLLM run through MA using genai-perf. Here is the resulting table output:

Models (Inference):
Model       Batch   Concurrency   Model Config Path          Instance Group   Max Batch Size   Satisfies Constraints   Throughput (infer/sec)   p99 Latency (ms)   p99 Inter Token Latency (ms)   p99 Time To First Token (ms)   Output Token Throughput (infer/sec)  
gpt2_vllm   1       1             gpt2_vllm_config_default   1:MODEL          0                Yes                     8.9                      117.0              11.3                           47704.6                        76192.3                              

Models (GPU Metrics):
Model       GPU UUID                                   Batch   Concurrency   Model Config Path          Instance Group   Satisfies Constraints   GPU Memory Usage (MB)   GPU Utilization (%)   GPU Power Usage (W)  
gpt2_vllm   GPU-8557549f-9c89-4384-8bd6-1fd823c342e0   1       1             gpt2_vllm_config_default   1:MODEL          Yes                     12768.5                 20.4                  65.2                 

Server Only:
Model           GPU UUID                                   GPU Memory Usage (MB)   GPU Utilization (%)   GPU Power Usage (W)  
triton-server   GPU-8557549f-9c89-4384-8bd6-1fd823c342e0   12768.0                 0.5                   42.1                 

@nv-braf nv-braf requested a review from tgerdesnv March 26, 2024 22:18
Copy link
Collaborator

@tgerdesnv tgerdesnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Awesome to see it able to run and print out results.

model_analyzer/perf_analyzer/perf_analyzer.py Show resolved Hide resolved
model_analyzer/perf_analyzer/perf_config.py Show resolved Hide resolved
model_analyzer/record/metrics_manager.py Show resolved Hide resolved
model_analyzer/record/metrics_manager.py Show resolved Hide resolved
@nv-braf nv-braf requested a review from tgerdesnv March 27, 2024 14:37
@nv-braf nv-braf merged commit 965ad1b into use-llm-metrics-in-ma Mar 27, 2024
3 checks passed
nv-braf added a commit that referenced this pull request Apr 8, 2024
* Initial changes to run genai-perf in MA

* Gating call to get LLM records

* Fixing captilization issue

* Removing debug

* Adding TODO

---------

Co-authored-by: root <[email protected]>
nv-braf added a commit that referenced this pull request Apr 8, 2024
* New Records for LLM metrics (#839)

* Adding new LLM metrics

* Adding base class for perf, inter_token, and time_to_first latency records

* Add --llm-mode option (#842)

* Adding CLI hook for LLM

* Changing to use --model-type

* Capture LLM metrics from genai-perf in MA (#844)

* Successfully reading from LLM CSV

* General cleanup

* All unit tests passing

* Fixing metric table typos

* Fixing typos

* Update constraints for LLMs (#845)

* Adding LLM values to list of possible constraints

* Fixing typo

* Adding new output fields for LLM (#846)

* Profiling model using genai-perf (#849)

* Initial changes to run genai-perf in MA

* Gating call to get LLM records

* Fixing captilization issue

* Removing debug

* Adding TODO

---------

Co-authored-by: root <[email protected]>

* Add genai_perf CLI options to MA (#854)

* Added support for genai_perf CLI

* Remove dead code

* Removing genai_perf collateral

* Fixing codeQL issue

* Adding streaming to genai_perf_config

---------

Co-authored-by: root <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants