Skip to content

Commit

Permalink
Update generate() API docs (#20433)
Browse files Browse the repository at this point in the history
Light edit of generate() docs
  • Loading branch information
natke authored Apr 23, 2024
1 parent a3eceb5 commit 0ebd6b4
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 62 deletions.
34 changes: 2 additions & 32 deletions docs/genai/api/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,6 @@ onnxruntime_genai.Model(model_folder: str) -> onnxruntime_genai.Model
#### Parameters

- `model_folder`: Location of model and configuration on disk
- `device`: The device to run on. One of:
- onnxruntime_genai.CPU
- onnxruntime_genai.CUDA
If not specified, defaults to CPU.

#### Returns

Expand All @@ -57,7 +53,7 @@ onnxruntime_genai.Model.generate(params: GeneratorParams) -> numpy.ndarray[int,
```

#### Parameters
- `params`: (Required) Created by the `GenerateParams` method.
- `params`: (Required) Created by the `GeneratorParams` method.

#### Returns

Expand Down Expand Up @@ -191,7 +187,7 @@ onnxruntime_genai.TokenizerStream.decode(token: int32) -> str

## GeneratorParams class

### Create a Generator Params
### Create a Generator Params object

```python
onnxruntime_genai.GeneratorParams(model: Model) -> GeneratorParams
Expand All @@ -209,8 +205,6 @@ onnxruntime_genai.GeneratorParams.input_ids = numpy.ndarray[numpy.int32, numpy.i
onnxruntime_genai.GeneratorParams.set_search_options(options: dict[str, Any])
```

###

## Generator class

### Create a Generator
Expand Down Expand Up @@ -256,30 +250,6 @@ Using the current set of logits and the specified generator parameters, calculat
onnxruntime_genai.Generator.generate_next_token()
```

### Generate next token with Top P sampling

Using the current set of logits and the specified generator parameters, calculates the next batch of tokens, using Top P sampling.

```python
onnxruntime_genai.Generator.generate_next_token_top_p()
```

### Generate next token with Top K sampling

Using the current set of logits and the specified generator parameters, calculates the next batch of tokens, using Top K sampling.

```python
onnxruntime_genai.Generator.generate_next_token_top_k()
```

### Generate next token with Top K and Top P sampling

Using the current set of logits and the specified generator parameters, calculates the next batch of tokens, using both Top K then Top P sampling.

```python
onnxruntime_genai.Generator.generate_next_token_top_k_top_p()
```

### Get next tokens

```python
Expand Down
72 changes: 43 additions & 29 deletions docs/genai/howto/build-from-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,68 +70,82 @@ cp build/linux-x64/native/libonnxruntime*.so* <ORT_HOME>/lib
### Option 3: Build from source

```
#### Clone the repo

```bash
git clone https://github.com/microsoft/onnxruntime.git
cd onnxruntime
```

Create include and lib folders in the `ORT_HOME` directory
#### Build ONNX Runtime for DirectML on Windows

```bash
mkdir <ORT HOME>/include
mkdir <ORT_HOME>/lib
build.bat --build_shared_lib --skip_tests --parallel --use_dml --config Release
```

Build from source and copy the include and libraries into `ORT_HOME`
#### Build ONNX Runtime for CPU on Windows

On Windows

```cmd
build.bat --build_shared_lib --skip_tests --parallel [--use_dml | --use_cuda] --config Release
copy include\onnxruntime\core\session\onnxruntime_c_api.h <ORT_HOME>\include
copy build\Windows\Release\Release\*.dll <ORT_HOME>\lib
copy build\Windows\Release\Release\onnxruntime.lib <ORTHOME>\lib
```bash
build.bat --build_shared_lib --skip_tests --parallel --config Release
```

If building for DirectML
#### Build ONNX Runtime for CUDA on Windows

```cmd
copy include\onnxruntime\core\providers\dml\dml_provider_factory.h <ORT_HOME>\include
```bash
build.bat --build_shared_lib --skip_tests --parallel --use_cuda --config Release
```

On Linux
#### Build ONNX Runtine on Linux

```bash
./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda] --config Release
cp include/onnxruntime/core/session/onnxruntime_c_api.h <ORT_HOME>/include
cp build/Linux/Release/libonnxruntime*.so* <ORT_HOME>/lib
```

On Mac
You may need to provide extra command line options for building with CUDA on Linux. An example full command is as follows.

```bash
./build.sh --parallel --build_shared_lib --use_cuda --cuda_version 11.8 --cuda_home /usr/local/cuda-11.8 --cudnn_home /usr/lib/x86_64-linux-gnu/ --config Release --build_wheel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="80" --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda-11.8/bin/nvcc
```

Replace the values given above for different versions and locations of CUDA.

#### Build ONNX Runtime on Mac

```bash
./build.sh --build_shared_lib --skip_tests --parallel --config Release
cp include/onnxruntime/core/session/onnxruntime_c_api.h <ORT_HOME>/include
cp build/MacOS/Release/libonnxruntime*.dylib* <ORT_HOME>/lib
```

## Build the generate() API

## Build onnxruntime-genai
### Build on Windows

### Build for CPU
If building for DirectML

```bash
cd ..
python build.py [--ort_home <ORT_HOME>]
copy ..\onnxruntime\include\onnxruntime\core\providers\dml\dml_provider_factory.h ort\include
```

```bash
copy ..\onnxruntime\include\onnxruntime\core\session\onnxruntime_c_api.h ort\include
copy ..\onnxruntime\build\Windows\Release\Release\*.dll ort\lib
copy ..\onnxruntime\build\Windows\Release\Release\onnxruntime.lib ort\lib
python build.py [--use_dml | --use_cuda]
```

### Build for CUDA
### Build on Linux

```bash
cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
cp ../onnxruntime/build/Linux/Release/libonnxruntime*.so* ort/lib
python build.py [--use_cuda]
```

These instructions assume you already have CUDA installed.
### Build on Mac

```bash
cd ..
python build.py --cuda_home <path to cuda home> [--ort_home <ORT_HOME>]
cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
cp ../onnxruntime/build/MacOS/Release/libonnxruntime*.dylib* ort/lib
python build.py
```

### Build for DirectML
Expand Down
2 changes: 1 addition & 1 deletion docs/genai/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ nav_order: 6

_Note: this API is in preview and is subject to change._

Run generative AI models with ONNX Runtime. Source code: https://github.com/microsoft/onnxruntime-genai
Run generative AI models with ONNX Runtime. Source code: (https://github.com/microsoft/onnxruntime-genai)

This library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.

Expand Down
4 changes: 4 additions & 0 deletions docs/genai/reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,10 @@ These are the options that are passed to ONNX Runtime, which runs the model on e

* **_provider_options_**: a prioritized list of execution targets on which to run the model. If running on CPU, this option is not present. A list of execution provider specific configurations can be specified inside the provider item.

Supported provider options:
* `cuda`
* `dml`

* **_log_id_**: a prefix to output when logging.


Expand Down

0 comments on commit 0ebd6b4

Please sign in to comment.