diff --git a/src/c++/perf_analyzer/genai-perf/docs/multi_modal.md b/src/c++/perf_analyzer/genai-perf/docs/multi_modal.md new file mode 100644 index 000000000..bb9f33c60 --- /dev/null +++ b/src/c++/perf_analyzer/genai-perf/docs/multi_modal.md @@ -0,0 +1,122 @@ + + +# Profile Vision-Language Models with GenAI-Perf + +GenAI-Perf allows you to profile Vision-Language Models (VLM) running on +[OpenAI Chat Completions API](https://platform.openai.com/docs/guides/chat-completions)-compatible server +by sending [multi-modal content](https://platform.openai.com/docs/guides/vision) to the server. +Currently, you can send multi-modal contents with GenAI-Perf using the following two approaches: +1. The synthetic data generation approach, where GenAI-Perf generates the multi-modal data for you. +2. The Bring Your Own Data (BYOD) approach, where you provide GenAI-Perf with the data to send. + +Before we dive into the two approaches, +you can start OpenAI API compatible server with a VLM model using following command: + +```bash +docker run --runtime nvidia --gpus all \ + -p 8000:8000 --ipc=host \ + vllm/vllm-openai:latest \ + --model llava-hf/llava-v1.6-mistral-7b-hf --dtype float16 +``` + + +## Approach 1: Synthetic Multi-Modal Data Generation + +GenAI-Perf can generate synthetic multi-modal data such as texts or images using +the parameters provide by the user through CLI. + +```bash +genai-perf profile \ + -m llava-hf/llava-v1.6-mistral-7b-hf \ + --service-kind openai \ + --endpoint-type vision \ + --image-width-mean 512 \ + --image-width-stddev 30 \ + --image-height-mean 512 \ + --image-height-stddev 30 \ + --image-format png \ + --synthetic-input-tokens-mean 100 \ + --synthetic-input-tokens-stddev 0 \ + --streaming +``` + +> [!Note] +> Under the hood, GenAI-Perf generates synthetic images using a few source images +> under the `llm_inputs/source_images` directory. +> If you would like to add/remove/edit the source images, +> you can do so by directly editing the source images under the directory. +> GenAI-Perf will pickup the images under the directory automatically when +> generating the synthetic images. + + +## Approach 2: Bring Your Own Data (BYOD) + +Instead of letting GenAI-Perf create the synthetic data, +you can also provide GenAI-Perf with your own data using +[`--input-file`](../README.md#--input-file-path) CLI option. +The file needs to be in JSONL format and should contain both the prompt and +the filepath to the image to send. + +For instance, an example of input file would look something as following: +```bash +// input.jsonl +{"text_input": "What is in this image?", "image": "path/to/image1.png"} +{"text_input": "What is the color of the dog?", "image": "path/to/image2.jpeg"} +{"text_input": "Describe the scene in the picture.", "image": "path/to/image3.png"} +... +``` + +After you create the file, you can run GenAI-Perf using the following command: + +```bash +genai-perf profile \ + -m llava-hf/llava-v1.6-mistral-7b-hf \ + --service-kind openai \ + --endpoint-type vision \ + --input-file input.jsonl \ + --streaming +``` + +Running GenAI-Perf using either approach will give you an example output that +looks like below: + +```bash + LLM Metrics +┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓ +┃ Statistic ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p75 ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩ +│ Time to first token (ms) │ 321.05 │ 291.30 │ 537.07 │ 497.88 │ 318.46 │ 317.35 │ +│ Inter token latency (ms) │ 12.28 │ 11.44 │ 12.88 │ 12.87 │ 12.81 │ 12.53 │ +│ Request latency (ms) │ 1,866.23 │ 1,044.70 │ 2,832.22 │ 2,779.63 │ 2,534.64 │ 2,054.03 │ +│ Output sequence length │ 126.68 │ 59.00 │ 204.00 │ 200.58 │ 177.80 │ 147.50 │ +│ Input sequence length │ 100.00 │ 100.00 │ 100.00 │ 100.00 │ 100.00 │ 100.00 │ +└──────────────────────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘ +Output token throughput (per sec): 67.40 +Request throughput (per sec): 0.53 +```