Skip to content

Commit

Permalink
Update API docs
Browse files Browse the repository at this point in the history
  • Loading branch information
natke committed May 20, 2024
1 parent e34a040 commit 876d8c4
Show file tree
Hide file tree
Showing 3 changed files with 116 additions and 43 deletions.
82 changes: 55 additions & 27 deletions docs/genai/api/c.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ _Note: this API is in preview and is subject to change._

### Create model

Creates a model from the given configuration directory and device type.
Creates a model from the given directory. The directory should contain a file called `genai_config.json`, which corresponds to the [configuration specification](../reference/config.md).

#### Parameters
* Input: config_path The path to the model configuration directory. The path is expected to be encoded in UTF-8.
Expand Down Expand Up @@ -224,6 +224,23 @@ Set a search option where the option is a bool.
OGA_EXPORT OgaResult* OGA_API_CALL OgaGeneratorParamsSetSearchBool(OgaGeneratorParams* generator_params, const char* name, bool value);
```
### Try graph capture with max batch size
Graph capture fixes the dynamic elements of the computation graph to constant values. It can provide more efficient execution in some environments. To execute in graph capture mode, the maximum batch size needs to be known ahead of time. This function can fail if there is not enough memory to allocate the specified maximum batch size.
#### Parameters
* generator_params: The generator params object to set the parameter on
* max_batch_size: The maximum batch size to allocate
#### Returns
`OgaResult` containing the error message if graph capture mode could not be configured with the specified batch size
```c
OGA_EXPORT OgaResult* OGA_API_CALL OgaGeneratorParamsTryGraphCaptureWithMaxBatchSize(OgaGeneratorParams* generator_params, int32_t max_batch_size);
```

### Set inputs

Sets the input ids for the generator params. The input ids are used to seed the generation.
Expand Down Expand Up @@ -255,12 +272,30 @@ Sets the input id sequences for the generator params. The input id sequences are
#### Returns
OgaResult containing the error message if the setting of the input id sequences failed.
OgaResult containing the error message if the setting of the input id sequences failed.
```c
OGA_EXPORT OgaResult* OGA_API_CALL OgaGeneratorParamsSetInputSequences(OgaGeneratorParams* generator_params, const OgaSequences* sequences);
```

### Set model input

Set an additional model input, aside from the input_ids. For example additional inputs for LoRA adapters.

### Parameters

* generator_params: The generator params to set the input on
* name: the name of the parameter to set
* tensor: the value of the parameter

### Returns

OgaResult containing the error message if the setting of the input failed.

```c
OGA_EXPORT OgaResult* OGA_API_CALL OgaGeneratorParamsSetWhisperInputFeatures(OgaGeneratorParams*, OgaTensor* tensor);
```
## Generator API
Expand Down Expand Up @@ -330,7 +365,7 @@ OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_ComputeLogits(OgaGenerator* gene
### Generate next token
Generates the next token based on the computed logits using the greedy search.
Generates the next token based on the computed logits using the configured generation parameters.
#### Parameters
Expand All @@ -341,32 +376,13 @@ Generates the next token based on the computed logits using the greedy search.
OgaResult containing the error message if the generation of the next token failed.
```c
OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_GenerateNextToken_Top(OgaGenerator* generator);
```

### Generate next token with Top K sampling

#### Parameters

#### Returns

```c
OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_GenerateNextToken_TopK(OgaGenerator* generator, int k, float t);
OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_GenerateNextToken(OgaGenerator* generator);
```

### Generate next token with Top P sampling
#### Parameters
#### Returns
```c
OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_GenerateNextToken_TopP(OgaGenerator* generator, float p, float t);
```

### Get number of tokens

Returns the number of tokens in the sequence at the given index.
Returns the number of tokens in the sequence at the given index.

#### Parameters

Expand All @@ -378,12 +394,12 @@ OGA_EXPORT OgaResult* OGA_API_CALL OgaGenerator_GenerateNextToken_TopP(OgaGenera
The number tokens in the sequence at the given index.

```c
OGA_EXPORT size_t OGA_API_CALL OgaGenerator_GetSequenceLength(const OgaGenerator* generator, size_t index);
OGA_EXPORT size_t OGA_API_CALL OgaGenerator_GetSequenceCount(const OgaGenerator* generator, size_t index);
```
### Get sequence
Returns a pointer to the sequence data at the given index. The number of tokens in the sequence is given by OgaGenerator_GetSequenceLength.
Returns a pointer to the sequence data at the given index. The number of tokens in the sequence is given by `OgaGenerator_GetSequenceCount`.
#### Parameters
Expand All @@ -395,7 +411,7 @@ Returns a pointer to the sequence data at the given index. The number of tokens
A pointer to the token sequence
```c
OGA_EXPORT const int32_t* OGA_API_CALL OgaGenerator_GetSequence(const OgaGenerator* generator, size_t index);
OGA_EXPORT const int32_t* OGA_API_CALL OgaGenerator_GetSequenceData(const OgaGenerator* generator, size_t index);
```

## Enums and structs
Expand All @@ -419,6 +435,18 @@ typedef struct OgaBuffer OgaBuffer;

## Utility functions

### Set the GPU device ID

```c
OGA_EXPORT OgaResult* OGA_API_CALL OgaSetCurrentGpuDeviceId(int device_id);
```
### Get the GPU device ID
```c
OGA_EXPORT OgaResult* OGA_API_CALL OgaGetCurrentGpuDeviceId(int* device_id);
```

### Get error message

#### Parameters
Expand Down
18 changes: 16 additions & 2 deletions docs/genai/api/csharp.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,12 @@ public void SetSearchOption(string searchOption, double value)
public void SetSearchOption(string searchOption, bool value)
```

### Try graph capture with max batch size

```csharp
public void TryGraphCaptureWithMaxBatchSize(int maxBatchSize)
```

### Set input ids method

```csharp
Expand All @@ -110,8 +116,11 @@ public void SetInputIDs(ReadOnlySpan<int> inputIDs, ulong sequenceLength, ulong
public void SetInputSequences(Sequences sequences)
```

### Set model inputs


```csharp
public void SetModelInput(string name, Tensor value)
```


## Generator class
Expand All @@ -137,9 +146,14 @@ public void ComputeLogits()
### Generate next token method

```csharp
public void GenerateNextTokenTop()
public void GenerateNextToken()
```

### Get sequence

```csharp
public ReadOnlySpan<int> GetSequence(ulong index)
```

## Sequences class

Expand Down
59 changes: 45 additions & 14 deletions docs/genai/api/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ import onnxruntime_genai

## Model class

### Load the model
### Load a model

Loads the ONNX model(s) and configuration from a folder on disk.

Expand Down Expand Up @@ -59,22 +59,14 @@ onnxruntime_genai.Model.generate(params: GeneratorParams) -> numpy.ndarray[int,

`numpy.ndarray[int, int]`: a two dimensional numpy array with dimensions equal to the size of the batch passed in and the maximum length of the sequence of tokens.

### Device type

## GeneratorParams class

### Create GeneratorParams object
Return the device type that the model has been configured to run on.

```python
onnxruntime_genai.GeneratorParams(model: onnxruntime_genai.Model) -> onnxruntime_genai.GeneratorParams
onnxruntime_genai.Model.device_type
```

#### Parameters

- `model`: (required) The model that was loaded by onnxruntime_genai.Model()

#### Returns

`onnxruntime_genai.GeneratorParams`: The GeneratorParams object

## Tokenizer class

Expand Down Expand Up @@ -193,18 +185,49 @@ onnxruntime_genai.TokenizerStream.decode(token: int32) -> str
onnxruntime_genai.GeneratorParams(model: Model) -> GeneratorParams
```

### Input_ids member
### Pad token id member

```python
onnxruntime_genai.GeneratorParams.input_ids = numpy.ndarray[numpy.int32, numpy.int32]
onnxruntime_genai.GeneratorParams.pad_token_id
```

### EOS token id member

```python
onnxruntime_genai.GeneratorParams.eos_token_id
```

### vocab size member

```python
onnxruntime_genai.GeneratorParams.vocab_size
```

### input_ids member

```python
onnxruntime_genai.GeneratorParams.input_ids: numpy.ndarray[numpy.int32, numpy.int32]
```

### Set model input

```python
onnxruntime_genai.GeneratorParams.set_model_input(name: str, value: [])
```


### Set search options method

```python
onnxruntime_genai.GeneratorParams.set_search_options(options: dict[str, Any])
```

### Try graph capture with max batch size

```python
onnxruntime_genai.GeneratorParams.try_graph_capture_with_max_batch_size(max_batch_size: int)
```

## Generator class

### Create a Generator
Expand Down Expand Up @@ -242,6 +265,14 @@ Runs the model through one iteration.
onnxruntime_genai.Generator.compute_logits()
```

### Get output

Returns the output logits of the model.

```python
onnxruntime_genai.Generator.get_output()
```

### Generate next token

Using the current set of logits and the specified generator parameters, calculates the next batch of tokens, using Top P sampling.
Expand Down

0 comments on commit 876d8c4

Please sign in to comment.