Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boilerplate for parsing, testing, and wrapping PA #471

Merged
merged 2 commits into from
Feb 27, 2024

Conversation

rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Feb 27, 2024

Install

rmccormick@ced35d0-lcedt:~/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa$ pip install .
Defaulting to user installation because normal site-packages is not writeable
Processing /home/rmccormick/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: rich in /home/rmccormick/.local/lib/python3.10/site-packages (from genai-pa==0.0.1) (13.5.2)
Requirement already satisfied: numpy in /home/rmccormick/.local/lib/python3.10/site-packages (from genai-pa==0.0.1) (1.25.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/rmccormick/.local/lib/python3.10/site-packages (from rich->genai-pa==0.0.1) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/rmccormick/.local/lib/python3.10/site-packages (from rich->genai-pa==0.0.1) (2.15.1)
Requirement already satisfied: mdurl~=0.1 in /home/rmccormick/.local/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->genai-pa==0.0.1) (0.1.2)
Building wheels for collected packages: genai-pa
  Building wheel for genai-pa (pyproject.toml) ... done
  Created wheel for genai-pa: filename=genai_pa-0.0.1-py3-none-any.whl size=8115 sha256=93809179dd9eb3f0855e31212083508145c3f567eb31c436ce8e1fa4142a704e
  Stored in directory: /home/rmccormick/.cache/pip/wheels/5e/4a/ca/f5376d51152c70651175ac6aa90bf03668bd352abf44a51476
Successfully built genai-pa
Installing collected packages: genai-pa
  Attempting uninstall: genai-pa
    Found existing installation: genai-pa 0.0.1
    Uninstalling genai-pa-0.0.1:
      Successfully uninstalled genai-pa-0.0.1
Successfully installed genai-pa-0.0.1

Example help

rmccormick@ced35d0-lcedt:~/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa$ genai-pa -h
usage: genai-pa [-h] -m MODEL [-b BATCH_SIZE] [--input-length INPUT_LENGTH] [--output-length OUTPUT_LENGTH] [--url URL] [--provider {triton,openai}] [--dataset {OpenOrca,cnn_dailymail}] [--tokenizer {auto}]

CLI to profile LLMs and Generative AI models with PA

options:
  -h, --help            show this help message and exit

Model:
  -m MODEL, --model MODEL
                        The name of the model to benchmark.

Profiling:
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        The batch size / concurrency to benchmark. (Default: 1)
  --input-length INPUT_LENGTH
                        The input length (tokens) to use for benchmarking LLMs. (Default: 128)
  --output-length OUTPUT_LENGTH
                        The output length (tokens) to use for benchmarking LLMs. (Default: 128)

Endpoint:
  --url URL             URL of the endpoint to target for benchmarking.
  --provider {triton,openai}
                        Provider format/schema to use for benchmarking.

Dataset:
  --dataset {OpenOrca,cnn_dailymail}
                        HuggingFace dataset to use for the benchmark.
  --tokenizer {auto}    The HuggingFace tokenizer to use to interpret token metrics from final text results

Run PA wrapper

rmccormick@ced35d0-lcedt:~/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa$ genai-pa -m opt125m
genai-pa - INFO - Running Perf Analyzer : '['perf_analyzer', '-i', 'grpc', '--streaming', '-m', 'opt125m', '--input-data', '/tmp/input_data.json']'
 Successfully read data for 1 stream/streams with 1 step/steps.
*** Measurement Settings ***
  Batch size: 1
  Service Kind: Triton
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using asynchronous calls for inference
  Detected decoupled model, using the first response for measuring latency
  Stabilizing using average latency

Request concurrency: 1
  Client: 
    Request count: 288
    Throughput: 15.9975 infer/sec
    Response Throughput: 15.9975 infer/sec
    Avg latency: 62332 usec (standard deviation 10467 usec)
    p50 latency: 66954 usec
    p90 latency: 70977 usec
    p95 latency: 72917 usec
    p99 latency: 75793 usec
    
  Server: 
    Inference count: 289
    Execution count: 289
    Successful request count: 289
    Avg request latency: 347 usec (overhead 4 usec + queue 44 usec + compute input 34 usec + compute infer 257 usec + compute output 6 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 15.9975 infer/sec, latency 62332 usec

Run tests

rmccormick@ced35d0-lcedt:~/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa$ pytest tests/
======================================== test session starts ========================================
platform linux -- Python 3.10.12, pytest-7.4.3, pluggy-1.2.0
rootdir: /home/rmccormick/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa
plugins: anyio-3.7.1, cov-4.1.0
collected 3 items                                                                                   

tests/test_cli.py ..                                                                          [ 66%]
tests/test_library.py .                                                                       [100%]

========================================= 3 passed in 0.01s =========================================

@rmccorm4 rmccorm4 changed the base branch from main to feature-genai-pa February 27, 2024 20:06
@dyastremsky
Copy link
Contributor

Nice structure! Think this should be good to merge and work off of.

@dyastremsky dyastremsky merged commit 9263895 into feature-genai-pa Feb 27, 2024
3 checks passed
@dyastremsky dyastremsky deleted the feature-genai-pa-rmccormick-cli branch February 27, 2024 20:55
matthewkotila pushed a commit that referenced this pull request Feb 27, 2024
* Add boilerplate code with placeholder for running PA

* Add return value for codeql
nv-braf pushed a commit that referenced this pull request Feb 29, 2024
* Add boilerplate code with placeholder for running PA

* Add return value for codeql
debermudez pushed a commit that referenced this pull request Mar 12, 2024
* Add boilerplate code with placeholder for running PA

* Add return value for codeql
debermudez pushed a commit that referenced this pull request Mar 13, 2024
* Add boilerplate code with placeholder for running PA

* Add return value for codeql
mc-nv pushed a commit that referenced this pull request Mar 13, 2024
* Add boilerplate code with placeholder for running PA

* Add return value for codeql
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants