Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using gemma2 2B model for js llm inference #477

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/llm_inference/js/README.md
Original file line number Diff line number Diff line change
@@ -13,7 +13,7 @@ This web sample demonstrates how to use the LLM Inference API to run common text

Follow the following instructions to run the sample on your device:
1. Make a folder for the task, named as `llm_task`, and copy the [index.html](https://github.com/googlesamples/mediapipe/blob/main/examples/llm_inference/js/index.html) and [index.js](https://github.com/googlesamples/mediapipe/blob/main/examples/llm_inference/js/index.js) files into your `llm_task` folder.
2. Download [Gemma 2B](https://www.kaggle.com/models/google/gemma/frameworks/tfLite/variations/gemma-2b-it-gpu-int4) (TensorFlow Lite 2b-it-gpu-int4 or 2b-it-gpu-int8) or convert an external LLM (Phi-2, Falcon, or StableLM) following the [guide](https://developers.google.com/mediapipe/solutions/genai/llm_inference/web_js#convert-model) (only gpu backend is currently supported), into the `llm_task` folder.
2. Download [Gemma2 2B](https://www.kaggle.com/models/google/gemma-2/tfLite/gemma2-2b-it-gpu-int8) (TensorFlow Lite 2b-it-gpu-int8 or 2b-it-cpu-int8) or convert an external LLM (Phi-2, Falcon, or StableLM) following the [guide](https://developers.google.com/mediapipe/solutions/genai/llm_inference/web_js#convert-model) (only gpu backend is currently supported), into the `llm_task` folder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support CPU models on Web.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We might also leave both models up until we have an int4 model of Gemma 2)

3. In your `index.js` file, update [`modelFileName`](https://github.com/googlesamples/mediapipe/blob/main/examples/llm_inference/js/index.js#L23) with your model file's name.
4. Run `python3 -m http.server 8000` under the `llm_task` folder to host the three files (or `python -m SimpleHTTPServer 8000` for older python versions).
5. Open `localhost:8000` in Chrome. Then the button on the webpage will be enabled when the task is ready (~10 seconds).
2 changes: 1 addition & 1 deletion examples/llm_inference/js/index.js
Original file line number Diff line number Diff line change
@@ -20,7 +20,7 @@ const input = document.getElementById('input');
const output = document.getElementById('output');
const submit = document.getElementById('submit');

const modelFileName = 'gemma-2b-it-gpu-int4.bin'; /* Update the file name */
const modelFileName = 'gemma2-2b-it-gpu-int8.bin'; /* Update the file name */

/**
* Display newly generated partial results to the output text box.