jamjamjon · jamjamjon · Apr 14, 2024
diff --git a/README.md b/README.md
@@ -1,50 +1,49 @@
 # usls
 
-A Rust library integrated with **ONNXRuntime**, providing a collection of **Computer Vison** and **Vision-Language** models including [YOLOv8](https://github.com/ultralytics/ultralytics), [YOLOv9](https://github.com/WongKinYiu/yolov9), [RTDETR](https://arxiv.org/abs/2304.08069), [CLIP](https://github.com/openai/CLIP), [DINOv2](https://github.com/facebookresearch/dinov2), [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM), [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [BLIP](https://arxiv.org/abs/2201.12086), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) and others.
-
+A Rust library integrated with **ONNXRuntime**, providing a collection of **Computer Vison** and **Vision-Language** models including [YOLOv8](https://github.com/ultralytics/ultralytics), [YOLOv9](https://github.com/WongKinYiu/yolov9), [RTDETR](https://arxiv.org/abs/2304.08069), [CLIP](https://github.com/openai/CLIP), [DINOv2](https://github.com/facebookresearch/dinov2), [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM), [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [BLIP](https://arxiv.org/abs/2201.12086), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) and others. Many execution providers are supported, sunch as `CUDA`, `TensorRT` and `CoreML`.
 
 ## Supported Models
 
-|                               Model                               |                                Task / Type                                |         Example         | CUDA<br />f32 | CUDA<br />f16 |     TensorRT<br />f32     |     TensorRT<br />f16     |
-| :---------------------------------------------------------------: | :------------------------------------------------------------------------: | :----------------------: | :-----------: | :-----------: | :------------------------: | :-----------------------: |
-|    [YOLOv8-detection](https://github.com/ultralytics/ultralytics)    |                              Object Detection                              |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|      [YOLOv8-pose](https://github.com/ultralytics/ultralytics)      |                             Keypoint Detection                             |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
-| [YOLOv8-classification](https://github.com/ultralytics/ultralytics) |                               Classification                               |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|  [YOLOv8-segmentation](https://github.com/ultralytics/ultralytics)  |                           Instance Segmentation                           |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|            [YOLOv9](https://github.com/WongKinYiu/yolov9)            |                              Object Detection                              |   [demo](examples/yolov9)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|             [RT-DETR](https://arxiv.org/abs/2304.08069)             |                              Object Detection                              |   [demo](examples/rtdetr)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|         [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM)         |                           Instance Segmentation                           |  [demo](examples/fastsam)  |      ✅      |      ✅      |             ✅             |            ✅            |
-|        [YOLO-World](https://github.com/AILab-CVC/YOLO-World)        |                              Object Detection                              | [demo](examples/yolo-world) |      ✅      |      ✅      |             ✅             |            ✅            |
-|         [DINOv2](https://github.com/facebookresearch/dinov2)         |                           Vision-Self-Supervised                           |   [demo](examples/dinov2)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|                [CLIP](https://github.com/openai/CLIP)                |                              Vision-Language                              |    [demo](examples/clip)    |      ✅      |      ✅      | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
-|              [BLIP](https://github.com/salesforce/BLIP)              |                              Vision-Language                              |    [demo](examples/blip)    |      ✅      |      ✅      | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
-|                [DB](https://arxiv.org/abs/1911.08947)                |                               Text Detection                               |     [demo](examples/db)     |      ✅      |      ✅      |             ✅             |            ✅            |
-|               [SVTR](https://arxiv.org/abs/2205.00159)               |                              Text Recognition                              |    [demo](examples/svtr)    |      ✅      |      ✅      |             ✅             |            ✅            |
-| [RTMO](https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo) |                             Keypoint Detection                             |    [demo](examples/rtmo)    |      ✅      |      ✅      |             ❌             |            ❌            |
-|              [YOLOPv2](https://arxiv.org/abs/2208.11434)              | Panoptic driving Perception |   [demo](examples/yolop)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                               Model                               |         Task / Type         |        Example         | CUDA<br />f32 | CUDA<br />f16 |     TensorRT<br />f32     |     TensorRT<br />f16     |
+| :---------------------------------------------------------------: | :----------------------: |:----------------------: | :-----------: | :-----------: | :------------------------: | :-----------------------: |
+|                    **[YOLOv8-detection](https://github.com/ultralytics/ultralytics)**        |     Object Detection       |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                       **[YOLOv8-pose](https://github.com/ultralytics/ultralytics)**          |     Keypoint Detection        |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                  **[YOLOv8-classification](https://github.com/ultralytics/ultralytics)**      |      Classification       |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                   **[YOLOv8-segmentation](https://github.com/ultralytics/ultralytics)**       |    Instance Segmentation        |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                         **[YOLOv9](https://github.com/WongKinYiu/yolov9)**       |      Object Detection       |   [demo](examples/yolov9)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                         **[RT-DETR](https://arxiv.org/abs/2304.08069)**      |      Object Detection                      |   [demo](examples/rtdetr)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                         **[FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM)**     |    Instance Segmentation                      |  [demo](examples/fastsam)  |      ✅      |      ✅      |             ✅             |            ✅            |
+|                       **[YOLO-World](https://github.com/AILab-CVC/YOLO-World)**      |     Object Detection                  | [demo](examples/yolo-world) |      ✅      |      ✅      |             ✅             |            ✅            |
+|                         **[DINOv2](https://github.com/facebookresearch/dinov2)**      |     Vision-Self-Supervised  |   [demo](examples/dinov2)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                          **[CLIP](https://github.com/openai/CLIP)**            |      Vision-Language    |    [demo](examples/clip)    |      ✅      |      ✅      | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
+|                          **[BLIP](https://github.com/salesforce/BLIP)**       |   Vision-Language     |    [demo](examples/blip)    |      ✅      |      ✅      | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
+|   [**DB**](https://arxiv.org/abs/1911.08947)   | Text Detection  |  [demo](examples/db)     |      ✅      |      ❌      |             ✅             |            ✅            |
+| [**SVTR**](https://arxiv.org/abs/2205.00159) | Text Recognition |   [demo](examples/svtr)    |      ✅      |      ❌      |             ✅             |            ✅            |
+| [**RTMO**](https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo)  | Keypoint Detection     |   [demo](examples/rtmo)   |      ✅      |     ✅        |    ❌                    |          ❌              |
+
 
 ## Solution Models
 
 Additionally, this repo also provides some solution models.
 
-|                             Model                             |             Example             |             Result             |
-| :------------------------------------------------------------: | :------------------------------: | :------------------------------: |
-|                Lane Line Segmentation<br /> Drivable Area Segmentation<br />Car Detection<br />车道线-可行驶区域-车辆检测                | [demo](examples/yolov8-plastic-bag) |<img src='examples/yolop/demo.png'  width="220px" height="140px">|
-|                  Face Parsing<br />  人脸解析                  |    [demo](examples/face-parsing)    |<img src='examples/face-parsing/demo.png' width="220px" height="200px"> |
-|    Text Detection<br />(PPOCR-det v3, v4)<br />通用文本检测    |         [demo](examples/db)         |<img src='examples/db/demo.jpg'  width="250px" height="200px">|
-| Text Recognition<br />(PPOCR-rec v3, v4)<br />中英文-文本识别 |        [demo](examples/svtr)        ||
-|         Face-Landmark Detection<br />人脸 & 关键点检测         |    [demo](examples/yolov8-face)    |<img src='examples/yolov8-face/demo.jpg'  width="220px" height="180px">|
-|                 Head Detection<br />  人头检测                 |    [demo](examples/yolov8-head)    |<img src='examples/yolov8-head/demo.jpg'  width="220px" height="180px">|
-|                 Fall Detection<br />  摔倒检测                 |  [demo](examples/yolov8-falldown)  |  <img src='examples/yolov8-falldown/demo.jpg'  width="220px" height="180px">|
-|                Trash Detection<br />  垃圾检测                | [demo](examples/yolov8-plastic-bag) |<img src='examples/yolov8-trash/demo.jpg'  width="250px" height="180px">|
+|                                       Model                                       |             Example             |
+| :--------------------------------------------------------------------------------: | :------------------------------: |
+|    **text detection<br />(PPOCR-det v3, v4)**<br />**通用文本检测**    |         [demo](examples/db)         |
+| **text recognition<br />(PPOCR-rec v3, v4)**<br />**中英文-文本识别** |        [demo](examples/svtr)        |
+|         **face-landmark detection**<br />**人脸 & 关键点检测**         |    [demo](examples/yolov8-face)    |
+|                 **head detection**<br />  **人头检测**                 |    [demo](examples/yolov8-head)    |
+|                 **fall detection**<br />  **摔倒检测**                 |  [demo](examples/yolov8-falldown)  |
+|                **trash detection**<br />  **垃圾检测**                | [demo](examples/yolov8-plastic-bag) |
 
 ## Demo
 
 ```
-cargo run -r --example yolov8   # yolov9, blip, clip, dinov2, svtr, db, yolo-world...
+cargo run -r --example yolov8   # fastsam, yolov9, blip, clip, dinov2, yolo-world...
 ```
 
-## Installation
+## Integrate into your own project
+
+#### 1. Install [ort](https://github.com/pykeio/ort)
 
 check **[ort guide](https://ort.pyke.io/setup/linking)**
 
@@ -59,16 +58,13 @@ check **[ort guide](https://ort.pyke.io/setup/linking)**
 
 </details>
 
-
-## Integrate into your own project
-
-#### 1. Add `usls` as a dependency to your project's `Cargo.toml`
+#### 2. Add `usls` as a dependency to your project's `Cargo.toml`
 
 ```shell
 cargo add --git https://github.com/jamjamjon/usls
 ```
 
-#### 2. Set `Options` and build model
+#### 3. Set `Options` and build model
 
 ```Rust
 let options = Options::default()
@@ -77,29 +73,32 @@ let mut model = YOLO::new(&options)?;
 ```
 
 - If you want to run your model with TensorRT or CoreML
+    ```Rust
+    let options = Options::default()
+        .with_trt(0) // using cuda by default
+        // .with_coreml(0) 
+    ```
+
 
-  ```Rust
-  let options = Options::default()
-      .with_trt(0) // using cuda by default
-      // .with_coreml(0) 
-  ```
 - If your model has dynamic shapes
+    ```Rust
+    let options = Options::default()
+        .with_i00((1, 2, 4).into()) // dynamic batch
+        .with_i02((416, 640, 800).into())   // dynamic height
+        .with_i03((416, 640, 800).into())   // dynamic width
+    ```
 
-  ```Rust
-  let options = Options::default()
-      .with_i00((1, 2, 4).into()) // dynamic batch
-      .with_i02((416, 640, 800).into())   // dynamic height
-      .with_i03((416, 640, 800).into())   // dynamic width
-  ```
 - If you want to set a confidence level for each category
+    ```Rust
+    let options = Options::default()
+        .with_confs(&[0.4, 0.15]) // person: 0.4, others: 0.15
+    ```
 
-  ```Rust
-  let options = Options::default()
-      .with_confs(&[0.4, 0.15]) // person: 0.4, others: 0.15
-  ```
 - Go check [Options](src/options.rs) for more model options.
 
-#### 3. Prepare inputs, and then you're ready to go
+
+
+#### 4. Prepare inputs, and then you're ready to go
 
 - Build `DataLoader` to load images
 
@@ -120,9 +119,22 @@ let x = vec![DataLoader::try_read("./assets/bus.jpg")?];
 let y = model.run(&x)?;
 ```
 
-#### 4. Annotate and save results
-
+#### 5. Annotate and save results
 ```Rust
 let annotator = Annotator::default().with_saveout("YOLOv8");
 annotator.annotate(&x, &y);
 ```
+
+
+## Script: converte ONNX model from `float32` to `float16`
+
+```python
+import onnx
+from pathlib import Path
+from onnxconverter_common import float16
+
+model_f32 = "onnx_model.onnx"
+model_f16 = float16.convert_float_to_float16(onnx.load(model_f32))
+saveout = Path(model_f32).with_name(Path(model_f32).stem + "-f16.onnx")
+onnx.save(model_f16, saveout)
+```
diff --git a/assets/car.jpg b/assets/car.jpg
diff --git a/assets/nini.png b/assets/nini.png
diff --git a/convert2f16.py b/convert2f16.py
diff --git a/examples/blip/README.md b/examples/blip/README.md
@@ -1,15 +1,41 @@
 This demo shows how to use [BLIP](https://arxiv.org/abs/2201.12086) to do conditional or unconditional image captioning.
 
+
 ## Quick Start
 
 ```shell
 cargo run -r --example blip
 ```
 
-## BLIP ONNX Model
+## Or you can manully
+
+
+### 1. Donwload CLIP ONNX Model
+
+[blip-visual-base](https://github.com/jamjamjon/assets/releases/download/v0.0.1/blip-visual-base.onnx)  
+[blip-textual-base](https://github.com/jamjamjon/assets/releases/download/v0.0.1/blip-textual-base.onnx)
+
+
+### 2. Specify the ONNX model path in `main.rs`
+
+```Rust
+    // visual
+    let options_visual = Options::default()
+        .with_model("VISUAL_MODEL")   // <= modify this
+        .with_profile(false);
 
-- [blip-visual-base](https://github.com/jamjamjon/assets/releases/download/v0.0.1/blip-visual-base.onnx)  
-- [blip-textual-base](https://github.com/jamjamjon/assets/releases/download/v0.0.1/blip-textual-base.onnx)
+    // textual
+    let options_textual = Options::default()
+        .with_model("TEXTUAL_MODEL")  // <= modify this
+        .with_profile(false);
+
+```
+
+### 3. Then, run
+
+```bash
+cargo run -r --example blip
+```
 
 
 ## Results

diff --git a/examples/clip/README.md b/examples/clip/README.md
@@ -6,10 +6,37 @@ This demo showcases how to use [CLIP](https://github.com/openai/CLIP) to compute
 cargo run -r --example clip
 ```
 
-## CLIP ONNX Model
+## Or you can manully
+
+
+### 1.Donwload CLIP ONNX Model
+
+[clip-b32-visual](https://github.com/jamjamjon/assets/releases/download/v0.0.1/clip-b32-visual.onnx)  
+[clip-b32-textual](https://github.com/jamjamjon/assets/releases/download/v0.0.1/clip-b32-textual.onnx)
+
+
+### 2. Specify the ONNX model path in `main.rs`
+
+```Rust
+    // visual
+    let options_visual = Options::default()
+        .with_model("VISUAL_MODEL")  // <= modify this
+        .with_i00((1, 1, 4).into())
+        .with_profile(false);
+
+    // textual
+    let options_textual = Options::default()
+        .with_model("TEXTUAL_MODEL")  // <= modify this
+        .with_i00((1, 1, 4).into())
+        .with_profile(false);
+```
+
+### 3. Then, run
+
+```bash
+cargo run -r --example clip
+```
 
-- [clip-b32-visual](https://github.com/jamjamjon/assets/releases/download/v0.0.1/clip-b32-visual.onnx)  
-- [clip-b32-textual](https://github.com/jamjamjon/assets/releases/download/v0.0.1/clip-b32-textual.onnx)
 
 
 ## Results
@@ -23,4 +50,9 @@ cargo run -r --example clip
 
 (86.59852%) ./examples/clip/images/doll.jpg => There is a doll with red hair and a clock on a table 
 [0.07032883, 0.00053773675, 0.0006372929, 0.06066096, 0.0007378078, 0.8659852, 0.0011121632]
-```
+```
+
+
+## TODO
+
+* [ ] TensorRT support for textual model
diff --git a/examples/db/README.md b/examples/db/README.md
@@ -4,10 +4,25 @@
 cargo run -r --example db
 ```
 
-## ONNX Model
+## Or you can manully
 
-- [ppocr-v3-db-dyn](https://github.com/jamjamjon/assets/releases/download/v0.0.1/ppocr-v3-db-dyn.onnx)  
-- [ppocr-v4-db-dyn](https://github.com/jamjamjon/assets/releases/download/v0.0.1/ppocr-v4-db-dyn.onnx)
+### 1. Donwload ONNX Model
+
+[ppocr-v3-db-dyn](https://github.com/jamjamjon/assets/releases/download/v0.0.1/ppocr-v3-db-dyn.onnx)  
+[ppocr-v4-db-dyn](https://github.com/jamjamjon/assets/releases/download/v0.0.1/ppocr-v4-db-dyn.onnx)
+
+### 2. Specify the ONNX model path in `main.rs`
+
+```Rust
+let options = Options::default()
+    .with_model("ONNX_PATH")    // <= modify this
+```
+
+### 3. Run
+
+```bash
+cargo run -r --example db
+```
 
 ### Speed test
 

diff --git a/examples/db/main.rs b/examples/db/main.rs
@@ -22,9 +22,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
 
     // annotate
     let annotator = Annotator::default()
+        .with_polygon_color([255u8, 0u8, 0u8])
         .without_name(true)
         .without_polygons(false)
-        .with_mask_alpha(0)
         .without_bboxes(false)
         .with_saveout("DB-Text-Detection");
     annotator.annotate(&x, &y);