Update generate() API docs (#20433)

Light edit of generate() docs
microsoft · Apr 23, 2024 · 0ebd6b4 · 0ebd6b4
1 parent a3eceb5
commit 0ebd6b4
Show file tree

Hide file tree

Showing 4 changed files with 50 additions and 62 deletions.
diff --git a/docs/genai/api/python.md b/docs/genai/api/python.md
@@ -41,10 +41,6 @@ onnxruntime_genai.Model(model_folder: str) -> onnxruntime_genai.Model
 #### Parameters
 
 - `model_folder`: Location of model and configuration on disk
-- `device`: The device to run on. One of:
-   - onnxruntime_genai.CPU
-   - onnxruntime_genai.CUDA
-   If not specified, defaults to CPU.
 
 #### Returns
 
@@ -57,7 +53,7 @@ onnxruntime_genai.Model.generate(params: GeneratorParams) -> numpy.ndarray[int,
 ```
 
 #### Parameters
-- `params`: (Required) Created by the `GenerateParams` method.
+- `params`: (Required) Created by the `GeneratorParams` method.
 
 #### Returns
 
@@ -191,7 +187,7 @@ onnxruntime_genai.TokenizerStream.decode(token: int32) -> str
 
 ## GeneratorParams class
 
-### Create a Generator Params
+### Create a Generator Params object
 
 ```python
 onnxruntime_genai.GeneratorParams(model: Model) -> GeneratorParams
@@ -209,8 +205,6 @@ onnxruntime_genai.GeneratorParams.input_ids = numpy.ndarray[numpy.int32, numpy.i
 onnxruntime_genai.GeneratorParams.set_search_options(options: dict[str, Any])
 ```
 
-### 
-
 ## Generator class
 
 ### Create a Generator
@@ -256,30 +250,6 @@ Using the current set of logits and the specified generator parameters, calculat
 onnxruntime_genai.Generator.generate_next_token()
 ```
 
-### Generate next token with Top P sampling
-
-Using the current set of logits and the specified generator parameters, calculates the next batch of tokens, using Top P sampling.
-
-```python
-onnxruntime_genai.Generator.generate_next_token_top_p()
-```
-
-### Generate next token with Top K sampling
-
-Using the current set of logits and the specified generator parameters, calculates the next batch of tokens, using Top K sampling.
-
-```python
-onnxruntime_genai.Generator.generate_next_token_top_k()
-```
-
-### Generate next token with Top K and Top P sampling
-
-Using the current set of logits and the specified generator parameters, calculates the next batch of tokens, using both Top K then Top P sampling.
-
-```python
-onnxruntime_genai.Generator.generate_next_token_top_k_top_p()
-```
-
 ### Get next tokens
 
 ```python

diff --git a/docs/genai/howto/build-from-source.md b/docs/genai/howto/build-from-source.md
@@ -70,68 +70,82 @@ cp build/linux-x64/native/libonnxruntime*.so* <ORT_HOME>/lib
       
 ### Option 3: Build from source
 
-```
+#### Clone the repo 
+
+```bash
 git clone https://github.com/microsoft/onnxruntime.git
 cd onnxruntime
 ```
 
-Create include and lib folders in the `ORT_HOME` directory
+#### Build ONNX Runtime for DirectML on Windows
 
 ```bash
-mkdir <ORT HOME>/include
-mkdir <ORT_HOME>/lib
+build.bat --build_shared_lib --skip_tests --parallel --use_dml --config Release
 ```
 
-Build from source and copy the include and libraries into `ORT_HOME`
+#### Build ONNX Runtime for CPU on Windows
 
-On Windows
-
-```cmd
-build.bat --build_shared_lib --skip_tests --parallel [--use_dml | --use_cuda] --config Release
-copy include\onnxruntime\core\session\onnxruntime_c_api.h <ORT_HOME>\include
-copy build\Windows\Release\Release\*.dll <ORT_HOME>\lib
-copy build\Windows\Release\Release\onnxruntime.lib <ORTHOME>\lib
+```bash
+build.bat --build_shared_lib --skip_tests --parallel --config Release
 ```
 
-If building for DirectML
+#### Build ONNX Runtime for CUDA on Windows
 
-```cmd
-copy include\onnxruntime\core\providers\dml\dml_provider_factory.h <ORT_HOME>\include
+```bash
+build.bat --build_shared_lib --skip_tests --parallel --use_cuda --config Release
 ```
 
-On Linux
+#### Build ONNX Runtine on Linux
 
 ```bash
 ./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda] --config Release
-cp include/onnxruntime/core/session/onnxruntime_c_api.h <ORT_HOME>/include
-cp build/Linux/Release/libonnxruntime*.so* <ORT_HOME>/lib
 ```
 
-On Mac
+You may need to provide extra command line options for building with CUDA on Linux. An example full command is as follows.
+
+```bash
+./build.sh --parallel --build_shared_lib --use_cuda --cuda_version 11.8 --cuda_home /usr/local/cuda-11.8 --cudnn_home /usr/lib/x86_64-linux-gnu/ --config Release --build_wheel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="80" --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda-11.8/bin/nvcc
+```
+
+Replace the values given above for different versions and locations of CUDA.
+
+#### Build ONNX Runtime on Mac
 
 ```bash
 ./build.sh --build_shared_lib --skip_tests --parallel --config Release
-cp include/onnxruntime/core/session/onnxruntime_c_api.h <ORT_HOME>/include
-cp build/MacOS/Release/libonnxruntime*.dylib* <ORT_HOME>/lib
 ```
 
+## Build the generate() API
 
-## Build onnxruntime-genai
+### Build on Windows
 
-### Build for CPU
+If building for DirectML
 
 ```bash
-cd ..
-python build.py [--ort_home <ORT_HOME>]
+copy ..\onnxruntime\include\onnxruntime\core\providers\dml\dml_provider_factory.h ort\include
+```
+
+```bash
+copy ..\onnxruntime\include\onnxruntime\core\session\onnxruntime_c_api.h ort\include
+copy ..\onnxruntime\build\Windows\Release\Release\*.dll ort\lib
+copy ..\onnxruntime\build\Windows\Release\Release\onnxruntime.lib ort\lib
+python build.py [--use_dml | --use_cuda]
 ```
 
-### Build for CUDA
+### Build on Linux
+
+```bash
+cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
+cp ../onnxruntime/build/Linux/Release/libonnxruntime*.so* ort/lib
+python build.py [--use_cuda]
+```
 
-These instructions assume you already have CUDA installed.
+### Build on Mac
 
 ```bash
-cd ..
-python build.py --cuda_home <path to cuda home> [--ort_home <ORT_HOME>]
+cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
+cp ../onnxruntime/build/MacOS/Release/libonnxruntime*.dylib* ort/lib
+python build.py
 ```
 
 ### Build for DirectML

diff --git a/docs/genai/index.md b/docs/genai/index.md
@@ -9,7 +9,7 @@ nav_order: 6
 
 _Note: this API is in preview and is subject to change._
 
-Run generative AI models with ONNX Runtime. Source code: https://github.com/microsoft/onnxruntime-genai 
+Run generative AI models with ONNX Runtime. Source code: (https://github.com/microsoft/onnxruntime-genai) 
 
 This library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.
 

diff --git a/docs/genai/reference/config.md b/docs/genai/reference/config.md
@@ -102,6 +102,10 @@ These are the options that are passed to ONNX Runtime, which runs the model on e
 
 * **_provider_options_**: a prioritized list of execution targets on which to run the model. If running on CPU, this option is not present. A list of execution provider specific configurations can be specified inside the provider item.
 
+  Supported provider options:
+  * `cuda`
+  * `dml`
+
 * **_log_id_**: a prefix to output when logging.