Skip to content

Commit

Permalink
address feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
nv-hwoo committed Jul 24, 2024
1 parent 717ad03 commit 69dea2d
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions src/c++/perf_analyzer/genai-perf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# GenAI-Perf

GenAI-Perf is a command line tool for measuring the throughput and latency of
generative AI models as served through an inference server. For large language
models (LLMs), as an example, GenAI-Perf provides metrics such as
generative AI models as served through an inference server.
For large language models (LLMs), GenAI-Perf provides metrics such as
[output token throughput](#output_token_throughput_metric),
[time to first token](#time_to_first_token_metric),
[inter token latency](#inter_token_latency_metric), and
[request throughput](#request_throughput_metric). For a full list of metrics
please see the [Metrics section](#metrics).
[request throughput](#request_throughput_metric).
For a full list of metrics please see the [Metrics section](#metrics).

Users specify a model name, an inference server URL, the type of inputs to use
(synthetic or from dataset), and the type of load to generate (number of
Expand All @@ -49,7 +49,7 @@ running when GenAI-Perf is run.

You can use GenAI-Perf to run performance benchmarks on
- [Large Language Models](docs/tutorial.md)
- [Multi-Modal Models](docs/multi_modal.md)
- [Vision Language Models](docs/multi_modal.md)
- [Embedding Models](docs/embeddings.md)
- [Ranking Models](docs/rankings.md)
- [Multiple LoRA Adapters](docs/lora.md)
Expand Down Expand Up @@ -87,7 +87,7 @@ genai-perf --help
<summary>Alternatively, to install from source:</summary>

Since GenAI-Perf depends on Perf Analyzer,
you'll need to install Perf Analyzer binary:
you'll need to install the Perf Analyzer binary:

### Install Perf Analyzer (Ubuntu, Python 3.8+)

Expand Down Expand Up @@ -121,8 +121,8 @@ QUICK START

## Quick Start

In this quick start, we will use GenAI-Perf to run performance benchmark on
the GPT-2 model running on Triton Inference Server with TensorRT-LLM engine.
In this quick start, we will use GenAI-Perf to run performance benchmarking on
the GPT-2 model running on Triton Inference Server with a TensorRT-LLM engine.

### Serve GPT-2 TensorRT-LLM model using Triton CLI

Expand Down

0 comments on commit 69dea2d

Please sign in to comment.