Add boilerplate for parsing, testing, and wrapping PA #471

rmccorm4 · 2024-02-27T20:05:50Z

Install

rmccormick@ced35d0-lcedt:~/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa$ pip install .
Defaulting to user installation because normal site-packages is not writeable
Processing /home/rmccormick/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: rich in /home/rmccormick/.local/lib/python3.10/site-packages (from genai-pa==0.0.1) (13.5.2)
Requirement already satisfied: numpy in /home/rmccormick/.local/lib/python3.10/site-packages (from genai-pa==0.0.1) (1.25.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/rmccormick/.local/lib/python3.10/site-packages (from rich->genai-pa==0.0.1) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/rmccormick/.local/lib/python3.10/site-packages (from rich->genai-pa==0.0.1) (2.15.1)
Requirement already satisfied: mdurl~=0.1 in /home/rmccormick/.local/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->genai-pa==0.0.1) (0.1.2)
Building wheels for collected packages: genai-pa
  Building wheel for genai-pa (pyproject.toml) ... done
  Created wheel for genai-pa: filename=genai_pa-0.0.1-py3-none-any.whl size=8115 sha256=93809179dd9eb3f0855e31212083508145c3f567eb31c436ce8e1fa4142a704e
  Stored in directory: /home/rmccormick/.cache/pip/wheels/5e/4a/ca/f5376d51152c70651175ac6aa90bf03668bd352abf44a51476
Successfully built genai-pa
Installing collected packages: genai-pa
  Attempting uninstall: genai-pa
    Found existing installation: genai-pa 0.0.1
    Uninstalling genai-pa-0.0.1:
      Successfully uninstalled genai-pa-0.0.1
Successfully installed genai-pa-0.0.1

Example help

rmccormick@ced35d0-lcedt:~/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa$ genai-pa -h
usage: genai-pa [-h] -m MODEL [-b BATCH_SIZE] [--input-length INPUT_LENGTH] [--output-length OUTPUT_LENGTH] [--url URL] [--provider {triton,openai}] [--dataset {OpenOrca,cnn_dailymail}] [--tokenizer {auto}]

CLI to profile LLMs and Generative AI models with PA

options:
  -h, --help            show this help message and exit

Model:
  -m MODEL, --model MODEL
                        The name of the model to benchmark.

Profiling:
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        The batch size / concurrency to benchmark. (Default: 1)
  --input-length INPUT_LENGTH
                        The input length (tokens) to use for benchmarking LLMs. (Default: 128)
  --output-length OUTPUT_LENGTH
                        The output length (tokens) to use for benchmarking LLMs. (Default: 128)

Endpoint:
  --url URL             URL of the endpoint to target for benchmarking.
  --provider {triton,openai}
                        Provider format/schema to use for benchmarking.

Dataset:
  --dataset {OpenOrca,cnn_dailymail}
                        HuggingFace dataset to use for the benchmark.
  --tokenizer {auto}    The HuggingFace tokenizer to use to interpret token metrics from final text results

Run PA wrapper

rmccormick@ced35d0-lcedt:~/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa$ genai-pa -m opt125m
genai-pa - INFO - Running Perf Analyzer : '['perf_analyzer', '-i', 'grpc', '--streaming', '-m', 'opt125m', '--input-data', '/tmp/input_data.json']'
 Successfully read data for 1 stream/streams with 1 step/steps.
*** Measurement Settings ***
  Batch size: 1
  Service Kind: Triton
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using asynchronous calls for inference
  Detected decoupled model, using the first response for measuring latency
  Stabilizing using average latency

Request concurrency: 1
  Client: 
    Request count: 288
    Throughput: 15.9975 infer/sec
    Response Throughput: 15.9975 infer/sec
    Avg latency: 62332 usec (standard deviation 10467 usec)
    p50 latency: 66954 usec
    p90 latency: 70977 usec
    p95 latency: 72917 usec
    p99 latency: 75793 usec
    
  Server: 
    Inference count: 289
    Execution count: 289
    Successful request count: 289
    Avg request latency: 347 usec (overhead 4 usec + queue 44 usec + compute input 34 usec + compute infer 257 usec + compute output 6 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 15.9975 infer/sec, latency 62332 usec

Run tests

rmccormick@ced35d0-lcedt:~/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa$ pytest tests/
======================================== test session starts ========================================
platform linux -- Python 3.10.12, pytest-7.4.3, pluggy-1.2.0
rootdir: /home/rmccormick/triton/jira/genaipa/client/src/c++/perf_analyzer/genai-pa
plugins: anyio-3.7.1, cov-4.1.0
collected 3 items                                                                                   

tests/test_cli.py ..                                                                          [ 66%]
tests/test_library.py .                                                                       [100%]

========================================= 3 passed in 0.01s =========================================

src/c++/perf_analyzer/genai-pa/genai_pa/main.py

dyastremsky · 2024-02-27T20:54:19Z

Nice structure! Think this should be good to merge and work off of.

* Add boilerplate code with placeholder for running PA * Add return value for codeql

Add boilerplate code with placeholder for running PA

b35f5e8

rmccorm4 changed the base branch from main to feature-genai-pa February 27, 2024 20:06

rmccorm4 requested review from debermudez and dyastremsky February 27, 2024 20:06

github-advanced-security bot found potential problems Feb 27, 2024

View reviewed changes

src/c++/perf_analyzer/genai-pa/genai_pa/main.py Fixed Show fixed Hide fixed

Add return value for codeql

359e563

dyastremsky approved these changes Feb 27, 2024

View reviewed changes

dyastremsky merged commit 9263895 into feature-genai-pa Feb 27, 2024
3 checks passed

dyastremsky deleted the feature-genai-pa-rmccormick-cli branch February 27, 2024 20:55

matthewkotila pushed a commit that referenced this pull request Feb 27, 2024

Add boilerplate for parsing, testing, and wrapping PA (#471)

15a5bee

* Add boilerplate code with placeholder for running PA * Add return value for codeql

nv-braf pushed a commit that referenced this pull request Feb 29, 2024

Add boilerplate for parsing, testing, and wrapping PA (#471)

10694c9

* Add boilerplate code with placeholder for running PA * Add return value for codeql

debermudez pushed a commit that referenced this pull request Mar 12, 2024

Add boilerplate for parsing, testing, and wrapping PA (#471)

92aa3ad

* Add boilerplate code with placeholder for running PA * Add return value for codeql

debermudez pushed a commit that referenced this pull request Mar 13, 2024

Add boilerplate for parsing, testing, and wrapping PA (#471)

36af858

* Add boilerplate code with placeholder for running PA * Add return value for codeql

mc-nv pushed a commit that referenced this pull request Mar 13, 2024

Add boilerplate for parsing, testing, and wrapping PA (#471)

81d18e7

* Add boilerplate code with placeholder for running PA * Add return value for codeql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add boilerplate for parsing, testing, and wrapping PA #471

Add boilerplate for parsing, testing, and wrapping PA #471

rmccorm4 commented Feb 27, 2024 •

edited

Loading

dyastremsky commented Feb 27, 2024

Add boilerplate for parsing, testing, and wrapping PA #471

Add boilerplate for parsing, testing, and wrapping PA #471

Conversation

rmccorm4 commented Feb 27, 2024 • edited Loading

dyastremsky commented Feb 27, 2024

rmccorm4 commented Feb 27, 2024 •

edited

Loading