triton-inference-server · pvijayakrish · Aug 13, 2024 · Aug 13, 2024
diff --git a/genai-perf/README.md b/genai-perf/README.md
@@ -335,13 +335,6 @@ You can optionally set additional model inputs with the following option:
   model with a singular value, such as `stream:true` or `max_tokens:5`. This
   flag can be repeated to supply multiple extra inputs.
 
-For [Large Language Models](docs/tutorial.md), there is no batch size (i.e.
-batch size is always `1`). Each request includes the inputs for one individual
-inference. Other modes such as the [embeddings](docs/embeddings.md) and
-[rankings](docs/rankings.md) endpoints support client-side batching, where
-`--batch-size N` means that each request sent will include the inputs for `N`
-separate inferences, allowing them to be processed together.
-
 </br>
 
 <!--

diff --git a/genai-perf/docs/embeddings.md b/genai-perf/docs/embeddings.md
@@ -68,18 +68,6 @@ genai-perf profile \
     --input-file embeddings.jsonl
 ```
 
-* `-m intfloat/e5-mistral-7b-instruct` is to specify what model you want to run
-  (`intfloat/e5-mistral-7b-instruct`)
-* `--service-kind openai` is to specify that the server type is OpenAI-API
-  compatible
-* `--endpoint-type embeddings` is to specify that the sent requests should be
-  formatted to follow the [embeddings
-  API](https://platform.openai.com/docs/api-reference/embeddings/create)
-* `--batch-size 2` is to specify that each request will contain the inputs for 2
-  individual inferences, making a batch size of 2
-* `--input-file embeddings.jsonl` is to specify the input data to be used for
-  inferencing
-
 This will use default values for optional arguments. You can also pass in
 additional arguments with the `--extra-inputs` [flag](../README.md#input-options).
 For example, you could use this command: