Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi3 Android App tutorial #21446

Open
wants to merge 21 commits into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 160 additions & 15 deletions docs/genai/api/java.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,97 @@ The GeneratorParams instance.
GeneratorParams params = generator.createGeneratorParams("What's 6 times 7?");
```

## Multimodal Processor Class
The MultiModalProcessor class is responsible for converting text/images into a NamedTensors list that can be fed into a Generator class instance.

### processImages Method

Processes text and image (optional) into a NamedTensors object.

```java
public NamedTensors processImages(String prompt, Images images) throws GenAIException
```

#### Parameters

- `prompt`: text input formatted according to model specifications.
- `images`: optional image input. Pass `null` if no image input.

#### Throws

`GenAIException`- if the call to the GenAI native API fails.

#### Returns

A NamedTensors object.

#### Example
Example using the [Phi-3 Vision](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu/tree/main/cpu-int4-rtn-block-32-acc-level-4) model.
```java
Images images = null;
if (inputImage != null) {
images = inputImage.getImages();
}

NamedTensors inputTensors = multiModalProcessor.processImages(promptQuestion_formatted, images);
```

### Decode Method

Decodes a sequence of token ids into text.

```java
public String decode(int[] sequence) throws GenAIException
```

#### Parameters

- `sequence`: collection of token ids to decode to text.

#### Throws

`GenAIException`- if the call to the GenAI native API fails.

#### Returns

The text representation of the sequence.

#### Example

```java
String result = multiModalProcessor.decode(output_ids);
```

### createStream Method

Creates a TokenizerStream object for streaming tokenization. This is used with Generator class to provide each token as it is generated.

```java
public TokenizerStream createStream() throws GenAIException
```

#### Throws

`GenAIException`- if the call to the GenAI native API fails.

#### Returns

The new TokenizerStream instance.

## Images Class
The Images class loads images from files to be used in the MultiModalProcessor class.
### Constructor

```java
public Images(String imagesPath) throws GenAIException
```

#### Parameters
- `imagesPath`: path for inputed image.

#### Throws
`GenAIException`- if the call to the GenAI native API fails.

## Tokenizer class

### Encode Method
Expand Down Expand Up @@ -244,7 +335,7 @@ public String decode(int[] sequence) throws GenAIException

#### Parameters

- `sequence`: collection of token ids to decode to text
- `sequence`: collection of token ids to decode to text.

#### Throws

Expand Down Expand Up @@ -344,7 +435,7 @@ public String decode(int token) throws GenAIException

#### Throws

`GenAIException`
`GenAIException`- if the call to the GenAI native API fails.

## Tensor Class

Expand All @@ -362,7 +453,7 @@ public Tensor(ByteBuffer data, long[] shape, ElementType elementType) throws Gen

#### Throws

`GenAIException`
`GenAIException`- if the call to the GenAI native API fails.

#### Example

Expand All @@ -377,6 +468,18 @@ floatBuffer.put(new float[] {1.0f, 2.0f, 3.0f, 4.0f});
Tensor tensor = new Tensor(data, shape, Tensor.ElementType.float32);
```

## NamedTensors Class
The NamedTensors class holds input from the MultiModalProcessor class.
### Constructor

```java
public NamedTensors(long handle)
```

#### Parameters

- `handle`: handle of NamedTensors.

## GeneratorParams class

The `GeneratorParams` class represents the parameters used for generating sequences with a model. Set the prompt using setInput, and any other search options using setSearchOption.
Expand All @@ -387,15 +490,15 @@ The `GeneratorParams` class represents the parameters used for generating sequen
GeneratorParams params = new GeneratorParams(model);
```

### setSearchOption Method
### setSearchOption Method (double)

```java
public void setSearchOption(String optionName, double value) throws GenAIException
```

#### Throws

`GenAIException`
`GenAIException`- if the call to the GenAI native API fails.

#### Example

Expand All @@ -405,15 +508,15 @@ Set search option to limit the model generation length.
generatorParams.setSearchOption("max_length", 10);
```

### setSearchOption Method
### setSearchOption Method (boolean)

```java
public void setSearchOption(String optionName, boolean value) throws GenAIException
```

#### Throws

`GenAIException`
`GenAIException`- if the call to the GenAI native API fails.

#### Example

Expand All @@ -440,18 +543,17 @@ public void setInput(Sequences sequences) throws GenAIException
generatorParams.setInput(encodedPrompt);
```

### setInput Method
### setInput Method (Token IDs)

Sets the prompt/s token ids for model execution. The `tokenIds` are the encoded parameters.

```java
public void setInput(int[] tokenIds, int sequenceLength, int batchSize)
throws GenAIException
public void setInput(int[] tokenIds, int sequenceLength, int batchSize) throws GenAIException
```

#### Parameters

- `tokenIds`: the token ids of the encoded prompt/s
- `tokenIds`: the token ids of the encoded prompt/s.
- `sequenceLength`: the length of each sequence.
- `batchSize`: size of the batch.

Expand All @@ -467,21 +569,64 @@ NOTE: all sequences in the batch must be the same length.
generatorParams.setInput(tokenIds, sequenceLength, batchSize);
```

### setInput Method (Tensor)
Add a Tensor as a model input.

```java
public void setInput(String name, Tensor tensor) throws GenAIException
```
#### Parameters

- `name`: name of the model input the tensor will provide.
- `tensor`: tensor to add.

#### Throws

`GenAIException`- if the call to the GenAI native API fails.

#### Example

```java
generatorParams.setInput(name, tensor);
```

### setInput Method (NamedTensors)
Add a NamedTensors as a model input.

```java
public void setInput(NamedTensors namedTensors) throws GenAIException {
```
#### Parameters

- `namedTensors`: NamedTensors to add.

#### Throws

`GenAIException`- if the call to the GenAI native API fails.

#### Example

```java
NamedTensors inputTensors = multiModalProcessor.processImages(promptQuestion_formatted, images);
generatorParams.setInput(inputTensors);
```

## Generator class

The Generator class generates output using a model and generator parameters.
The expected usage is to loop until isDone returns false. Within the loop, call computeLogits followed by generateNextToken.

The newly generated token can be retrieved with getLastTokenInSequence and decoded with TokenizerStream.Decode.
The expected usage is to loop until isDone() returns false. Within the loop, call computeLogits() followed by generateNextToken().

The newly generated token can be retrieved with getLastTokenInSequence() and decoded with TokenizerStream.Decode.

After the generation process is done, GetSequence can be used to retrieve the complete generated sequence if needed.
After the generation process is done, getSequence() can be used to retrieve the complete generated sequence if needed.

### Create a Generator

Constructs a Generator object with the given model and generator parameters.

```java
Generator(Model model, GeneratorParams generatorParams)
public Generator(Model model, GeneratorParams generatorParams) throws GenAIException
```

#### Parameters
Expand Down
118 changes: 118 additions & 0 deletions docs/genai/tutorials/phi3-android.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
title: Phi-3 for Android
description: Develop an Android generative AI application with ONNX Runtime
has_children: false
parent: Tutorials
grand_parent: Generate API (Preview)
nav_order: 1
---

# Build an Android generative AI application
This is a [Phi-3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) Android example application using [ONNX Runtime mobile](https://onnxruntime.ai/docs/tutorials/mobile/) and [ONNX Runtime Generate() API](https://github.com/microsoft/onnxruntime-genai) with support for efficiently running generative AI models. This tutorial will walk you through how to build and run the Phi-3 app on your own mobile device so you can get started incorporating Phi-3 into your own mobile developments.

## Model Capabilities
[Phi-3 Mini-4k-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) is a small language model used for language understanding, math, code, long context, logical reasoning, and more showcasing a robust and state-of-the-art performance among models with less than 13 billion parameters.
fionabos marked this conversation as resolved.
Show resolved Hide resolved

## Important Features

### Java API
This app uses the [generate() Java API](https://github.com/microsoft/onnxruntime-genai/tree/main/src/java/src/main/java/ai/onnxruntime/genai) GenAIException, Generator, GeneratorParams, Model, and TokenizerStream classes ([documentation](https://onnxruntime.ai/docs/genai/api/java.html)). The [generate() C API](https://onnxruntime.ai/docs/genai/api/c.html), [generate() C# API](https://onnxruntime.ai/docs/genai/api/csharp.html), and [generate() Python API](https://onnxruntime.ai/docs/genai/api/python.html) are also available.
fionabos marked this conversation as resolved.
Show resolved Hide resolved

### Model Download
This app downloads the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model from Hugging Face. To use a different model, such as the [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/tree/main), change the path links to refer to your chosen model. If using a model with imaging capabilities, use the [MultiModalProcessor class]() in place of the [Tokenizer class]() and update the prompt template accordingly.
fionabos marked this conversation as resolved.
Show resolved Hide resolved
```java
final String baseUrl = "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx/resolve/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/";
List<String> files = Arrays.asList(
"added_tokens.json",
"config.json",
"configuration_phi3.py",
"genai_config.json",
"phi3-mini-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx",
"phi3-mini-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx.data",
"special_tokens_map.json",
"tokenizer.json",
"tokenizer.model",
"tokenizer_config.json");
```
The model files will only need to be downloaded once. While editing your app and running new versions, the downloads will skip since all files already exist.
```java
if (urlFilePairs.isEmpty()) {
// Display a message using Toast
Toast.makeText(this, "All files already exist. Skipping download.", Toast.LENGTH_SHORT).show();
Log.d(TAG, "All files already exist. Skipping download.");
model = new Model(getFilesDir().getPath());
tokenizer = model.createTokenizer();
return;
}
```
### Status while model is downloading
Downloading the packages for the app on your mobile device takes ~10-15 minutes depending on which device you are using. The progress bar indicates what percent of the downloads are completed.
fionabos marked this conversation as resolved.
Show resolved Hide resolved
```java
public void onProgress(long lastBytesRead, long bytesRead, long bytesTotal) {
long lastPctDone = 100 * lastBytesRead / bytesTotal;
long pctDone = 100 * bytesRead / bytesTotal;
if (pctDone > lastPctDone) {
Log.d(TAG, "Downloading files: " + pctDone + "%");
runOnUiThread(() -> {
progressText.setText("Downloading: " + pctDone + "%");
});
}
}
```
Because the app is initialized when downloads start, the 'send' button for prompts is disabled until downloads are complete to prevent crashing.
```java
if (model == null) {
// if the edit text is empty display a toast message.
Toast.makeText(MainActivity.this, "Model not loaded yet, please wait...", Toast.LENGTH_SHORT).show();
return;
}
```

### Prompt Template
On its own, this model's answers can be very long. To format the AI assistant's answers, you can adjust the prompt template.
```java
String promptQuestion = userMsgEdt.getText().toString();
String promptQuestion_formatted = "<system>You are a helpful AI assistant. Answer in two paragraphs or less<|end|><|user|>"+promptQuestion+"<|end|>\n<assistant|>";
Log.i("GenAI: prompt question", promptQuestion_formatted);
```
You can also include limits such as a max_length or length_penalty to your liking.
fionabos marked this conversation as resolved.
Show resolved Hide resolved
```java
generatorParams.setSearchOption("length_penalty", 1000);
generatorParams.setSearchOption("max_length", 500);
```
NOTE: Including a max_length will cut off the assistant's answer once reaching the maximum number of tokens rather than formatting a complete response.
sophies927 marked this conversation as resolved.
Show resolved Hide resolved

## Run the App

### Download Android Studio
You will be using [Android Studio](https://developer.android.com/studio) to run the app.

### Download the App
fionabos marked this conversation as resolved.
Show resolved Hide resolved
Clone the [ONNX Runtime Inference Examples](https://github.com/microsoft/onnxruntime-inference-examples/tree/c29d8edd6d010a2649d69f84f54539f1062d776d) repository.
sophies927 marked this conversation as resolved.
Show resolved Hide resolved

### Enable Developer Mode on Mobile
On your Android Mobile device, go to "Settings > About Phone > Software information" and tap the "Build Number" tile repeatedly until you see the message “You are now in developer mode”. In "Developer Options", turn on Wireless or USB debugging.

### Open Project in Android Studio
Open the Phi-3 mobile app in Android Studio (onnxruntime-inference-examples/mobile/examples/phi-3/android/app).
fionabos marked this conversation as resolved.
Show resolved Hide resolved

### Connect Device
To run the app on a device, follow the instructions from the Running Devices tab on the right side panel. You can connect through Wi-Fi or USB.
![WiFi Instructions](../../../images/phi3_MobileTutorial_RunDevice.png)
#### Pair over Wi-Fi
fionabos marked this conversation as resolved.
Show resolved Hide resolved
![WiFi Instructions](../../../images/phi3_MobileTutorial_WiFi.png)

### Manage Devices
You can manage/change devices and device model through the Device Manager tab on the right side panel.
![WiFi Instructions](../../../images/phi3_MobileTutorial_DeviceManager.png)
fionabos marked this conversation as resolved.
Show resolved Hide resolved

### Downloading the App
Once your device is connected, run the app by using the play button on the top panel. Downloading all packages will take ~10-15 minutes. If you submit a prompt before downloads are complete, you will encounter an error message. Once completed, the logcat (the cat tab on the bottom left panel) will display an "All downloads complete" message.

![WiFi Instructions](../../../images/phi3_MobileTutorial_Error.png)

### Ask questions
Now that the app is downloaded, you can start asking questions!
fionabos marked this conversation as resolved.
Show resolved Hide resolved
![Example Prompt 1](../../../images/phi3_MobileTutorial_ex1.png)
![Example Prompt 2](../../../images/phi3_MobileTutorial_ex2.png)
![Example Prompt 3](../../../images/phi3_MobileTutorial_ex3.png)
fionabos marked this conversation as resolved.
Show resolved Hide resolved
Binary file added images/phi3_MobileTutorial_DeviceManager.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/phi3_MobileTutorial_Error.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/phi3_MobileTutorial_RunDevice.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/phi3_MobileTutorial_WiFi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/phi3_MobileTutorial_ex1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/phi3_MobileTutorial_ex2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/phi3_MobileTutorial_ex3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading