improve the segment-anything example (#385)

* improve the example * remove some text * remove provider selection from ui * point to sd-turbo * make wasm work on int8
microsoft · Feb 28, 2024 · dfa685f · dfa685f
1 parent e09a4a6
commit dfa685f
Show file tree

Hide file tree

Showing 7 changed files with 534 additions and 371 deletions.
diff --git a/js/README.md b/js/README.md
@@ -47,3 +47,5 @@ Click links for README of each examples.
 * [OpenAI Whisper](ort-whisper) - demonstrates how to run [whisper tiny.en](https://github.com/openai/whisper) in your browser using [onnxruntime-web](https://github.com/microsoft/onnxruntime) and the browser's audio interfaces.
 
 * [Facebook Segment-Anything](segment-anything) - demonstrates how to run [segment-anything](https://github.com/facebookresearch/segment-anything) in your browser using [onnxruntime-web](https://github.com/microsoft/onnxruntime/js) with webgpu.
+
+* [Stable Diffusion Turbo](sd-turbo) - demonstrates how to run [Stable Diffusion Turbo](https://huggingface.co/stabilityai/sd-turbo) in your browser using [onnxruntime-web](https://github.com/microsoft/onnxruntime/js) with webgpu.
diff --git a/js/segment-anything/README.md b/js/segment-anything/README.md
@@ -1,63 +1,47 @@
-# Run Segment-Anything in your browser using webgpu and onnxruntime-web
+# Segment-Anything: Browser-Based Image Segmentation with WebGPU and ONNX Runtime Web
 
-This example demonstrates how to run [Segment-Anything](https://github.com/facebookresearch/segment-anything) in your 
-browser using [onnxruntime-web](https://github.com/microsoft/onnxruntime) and webgpu.
+This repository contains an example of running [Segment-Anything](https://github.com/facebookresearch/segment-anything), an encoder/decoder model for image segmentation, in a browser using [ONNX Runtime Web](https://github.com/microsoft/onnxruntime) with WebGPU.
 
-Segment-Anything is a encoder/decoder model. The encoder creates embeddings and using the embeddings the decoder creates the segmentation mask.
+You can try out the live demo [here](https://guschmue.github.io/ort-webgpu/segment-anything/index.html).
 
-One can run the decoder in onnxruntime-web using WebAssembly with latencies at ~200ms. 
+## Model Overview
 
-The encoder is much more compute intensive and takes ~45sec using WebAssembly what is not practical.
-Using webgpu we can speedup the encoder ~50 times and it becomes visible to run it inside the browser, even on a integrated GPU.
+Segment-Anything creates embeddings for an image using an encoder. These embeddings are then used by the decoder to create and update the segmentation mask. The decoder can run in ONNX Runtime Web using WebAssembly with latencies at ~200ms. 
 
-## Usage
+The encoder is more compute-intensive, taking ~45sec in WebAssembly, which is not practical. However, by using WebGPU, we can speed up the encoder, making it feasible to run it inside the browser, even on an integrated GPU.
+
+## Getting Started
+
+### Prerequisites
+
+Ensure that you have [Node.js](https://nodejs.org/) installed on your machine.
 
 ### Installation
-First, install the required dependencies by running the following command in your terminal:
+
+1. Install the required dependencies:
+
 ```sh
 npm install
 ```
 
-### Build the code
-Next, bundle the code using webpack by running:
+### Building the Project
+
+1. Bundle the code using webpack:
+
 ```sh
 npm run build
 ```
-this generates the bundle file `./dist/bundle.min.js`
 
-### Create an ONNX Model
+This command generates the bundle file `./dist/index.js`.
 
-We use [samexporter](https://github.com/vietanhdev/samexporter) to export encoder and decoder to onnx.
-Install samexporter:
-```sh
-pip install https://github.com/vietanhdev/samexporter
-```
-Download the pytorch model from [Segment-Anything](https://github.com/facebookresearch/segment-anything). We use the smallest flavor (vit_b).
-```sh
-curl -o models/sam_vit_b_01ec64.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
-```
-Export both encoder and decoder to onnx:
-```sh
-python -m samexporter.export_encoder --checkpoint models/sam_vit_b_01ec64.pth \
- --output models/sam_vit_b_01ec64.encoder.onnx \
- --model-type vit_b 
-
-python -m samexporter.export_decoder --checkpoint models/sam_vit_b_01ec64.pth \
- --output models/sam_vit_b_01ec64.decoder.onnx \
- --model-type vit_b \
- --return-single-mask
-```
-### Start a web server
-Use NPM package `light-server` to serve the current folder at http://localhost:8888/.
-To start the server, run:
-```sh
-npx light-server -s . -p 8888
-```
+### The ONNX Model
 
-### Point your browser at the web server
-Once the web server is running, open your browser and navigate to http://localhost:8888/. 
-You should now be able to run Segment-Anything in your browser.
+The model used in this project is hosted on [Hugging Face](https://huggingface.co/schmuell/sam-b-fp16). It was created using [samexporter](https://github.com/vietanhdev/samexporter).
 
-## TODO
-* add support for fp16
-* add support for MobileSam
+### Running the Project
+
+Start a web server to serve the current folder at http://localhost:8888/. To start the server, run:
+
+```sh
+npm run dev
+```
diff --git a/js/segment-anything/index.html b/js/segment-anything/index.html
@@ -3,9 +3,9 @@
 <head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
- <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css" rel="stylesheet"
-  integrity="sha384-rbsA2VBKQhggwzxH7pPCaAqO46MgnOM80zW1RWuH61DGLwZJEdK2Kadq2F9CUG65" crossorigin="anonymous">
-  <script src="./dist/bundle.min.js"></script>
+ <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.1/dist/css/bootstrap.min.css" rel="stylesheet"
+ integrity="sha384-4bw+/aepP/YC94hEpVNVgiZdgIC5+VKNBQNGCHeKRQN+PtmoHDEXuppvnDJzQIu9" crossorigin="anonymous" />
+ <script type="module" src="dist/index.js"></script>
 
  <style>
  /* Add rounded corners to blocks */
@@ -23,7 +23,7 @@
  left: 50%;
  transform: translate(-50%, -50%);
  padding: 5px 10px;
- background-color: white;
+ background-color: #212529;
  font-size: 18px;
  }
 
@@ -38,7 +38,7 @@
 
 </head>
 
-<body>
+<body data-bs-theme="dark">
  <title>segment anything example</title>
  <div class="container-fluid">
  <h2>segment anything example</h2>
@@ -71,15 +71,16 @@ <h4>Latencies</h4>
  accept=".jpg, .png, .jpeg, .gif, .bmp, .tif, .tiff|image/*">
  </div>
  </form>
+ <div class="form-group ">
+ <button id="cut-button" type="button" class="btn btn-primary">Cut</button>
+ <button id="clear-button" type="button" class="btn btn-primary">Clear</button>
+ </div>
+ <div style="margin-top: 30px;">
+ <div>Other providers:</div>
+ <a href="index.html?provider=wasm&model=sam_b_int8">wasm</a>
+ <a href="index.html?provider=webgpu&model=sam_b">webgpu</a>
+ </div>
  </div>
- <div style="margin-top: 30px;">
- <div>Other providers:</div>
- <a href="index.html?provider=wasm">wasm</a>
- <a href="index.html?provider=webgpu">webgpu</a>
- <a href="index.html?provider=webnn">webnn</a>
- </div>
-
- </div>
 
  <p class="text-lg-start">
  <div id="status" style="font: 1em consolas;"></div>