Add YOLOv8-OBB and some bug fixes (#9)

* Add YOLOv8-Obb & Refactor outputs * Update README.md
jamjamjon · Apr 21, 2024 · beda8ef · beda8ef
1 parent 91049fc
commit beda8ef
Show file tree

Hide file tree

Showing 109 changed files with 2,532 additions and 1,930 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -40,3 +40,4 @@ indicatif = "0.17.8"
 image = "0.25.1"
 imageproc = { version = "0.24" }
 ab_glyph = "0.2.23"
+geo = "0.28.0"
diff --git a/README.md b/README.md
@@ -1,42 +1,65 @@
 # usls
 
-A Rust library integrated with **ONNXRuntime**, providing a collection of **Computer Vison** and **Vision-Language** models including [YOLOv8](https://github.com/ultralytics/ultralytics), [YOLOv9](https://github.com/WongKinYiu/yolov9), [RTDETR](https://arxiv.org/abs/2304.08069), [CLIP](https://github.com/openai/CLIP), [DINOv2](https://github.com/facebookresearch/dinov2), [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM), [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [BLIP](https://arxiv.org/abs/2201.12086), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) and others.
+A Rust library integrated with **ONNXRuntime**, providing a collection of **Computer Vison** and **Vision-Language** models including [YOLOv5](https://github.com/ultralytics/yolov5), [YOLOv8](https://github.com/ultralytics/ultralytics), [YOLOv9](https://github.com/WongKinYiu/yolov9), [RTDETR](https://arxiv.org/abs/2304.08069), [CLIP](https://github.com/openai/CLIP), [DINOv2](https://github.com/facebookresearch/dinov2), [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM), [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [BLIP](https://arxiv.org/abs/2201.12086), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) and others.
+
+## Recently Updated
+
+|        YOLOP-v2          |             Face-Parsing             |               Text-Detection           |  
+| :----------------------------: | :------------------------------: |  :------------------------------: |
+|<img src='examples/yolop/demo.png'  height="240px">| <img src='examples/face-parsing/demo.png'  height="240px"> | <img src='examples/db/demo.png'  height="240px"> |
+
+
+|        YOLOv8-Obb         |
+| :----------------------------: |
+|<img src='examples/yolov8/demo-obb-2.png'   width="800px">|
+
+
+
+
+
 
 
 ## Supported Models
 
-|                               Model                               |                                Task / Type                                |         Example         | CUDA<br />f32 | CUDA<br />f16 |     TensorRT<br />f32     |     TensorRT<br />f16     |
-| :---------------------------------------------------------------: | :------------------------------------------------------------------------: | :----------------------: | :-----------: | :-----------: | :------------------------: | :-----------------------: |
-|    [YOLOv8-detection](https://github.com/ultralytics/ultralytics)    |                              Object Detection                              |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|      [YOLOv8-pose](https://github.com/ultralytics/ultralytics)      |                             Keypoint Detection                             |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
-| [YOLOv8-classification](https://github.com/ultralytics/ultralytics) |                               Classification                               |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|  [YOLOv8-segmentation](https://github.com/ultralytics/ultralytics)  |                           Instance Segmentation                           |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|            [YOLOv9](https://github.com/WongKinYiu/yolov9)            |                              Object Detection                              |   [demo](examples/yolov9)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|             [RT-DETR](https://arxiv.org/abs/2304.08069)             |                              Object Detection                              |   [demo](examples/rtdetr)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|         [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM)         |                           Instance Segmentation                           |  [demo](examples/fastsam)  |      ✅      |      ✅      |             ✅             |            ✅            |
-|        [YOLO-World](https://github.com/AILab-CVC/YOLO-World)        |                              Object Detection                              | [demo](examples/yolo-world) |      ✅      |      ✅      |             ✅             |            ✅            |
-|         [DINOv2](https://github.com/facebookresearch/dinov2)         |                           Vision-Self-Supervised                           |   [demo](examples/dinov2)   |      ✅      |      ✅      |             ✅             |            ✅            |
-|                [CLIP](https://github.com/openai/CLIP)                |                              Vision-Language                              |    [demo](examples/clip)    |      ✅      |      ✅      | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
-|              [BLIP](https://github.com/salesforce/BLIP)              |                              Vision-Language                              |    [demo](examples/blip)    |      ✅      |      ✅      | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
-|                [DB](https://arxiv.org/abs/1911.08947)                |                               Text Detection                               |     [demo](examples/db)     |      ✅      |      ✅      |             ✅             |            ✅            |
-|               [SVTR](https://arxiv.org/abs/2205.00159)               |                              Text Recognition                              |    [demo](examples/svtr)    |      ✅      |      ✅      |             ✅             |            ✅            |
-| [RTMO](https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo) |                             Keypoint Detection                             |    [demo](examples/rtmo)    |      ✅      |      ✅      |             ❌             |            ❌            |
-|              [YOLOPv2](https://arxiv.org/abs/2208.11434)              | Panoptic driving Perception |   [demo](examples/yolop)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                               Model                               |         Task / Type         |         Example         | CUDA<br />f32 | CUDA<br />f16 |     TensorRT<br />f32     |     TensorRT<br />f16     |
+| :---------------------------------------------------------------: | :-------------------------: | :----------------------: | :-----------: | :-----------: | :------------------------: | :-----------------------: |
+|       [YOLOv8-obb](https://github.com/ultralytics/ultralytics)       |  Oriented Object Detection  |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|    [YOLOv8-detection](https://github.com/ultralytics/ultralytics)    |      Object Detection      |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|      [YOLOv8-pose](https://github.com/ultralytics/ultralytics)      |     Keypoint Detection     |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+| [YOLOv8-classification](https://github.com/ultralytics/ultralytics) |       Classification       |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|  [YOLOv8-segmentation](https://github.com/ultralytics/ultralytics)  |    Instance Segmentation    |   [demo](examples/yolov8)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|            [YOLOv9](https://github.com/WongKinYiu/yolov9)            |      Object Detection      |   [demo](examples/yolov9)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|             [RT-DETR](https://arxiv.org/abs/2304.08069)             |      Object Detection      |   [demo](examples/rtdetr)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|         [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM)         |    Instance Segmentation    |  [demo](examples/fastsam)  |      ✅      |      ✅      |             ✅             |            ✅            |
+|        [YOLO-World](https://github.com/AILab-CVC/YOLO-World)        |      Object Detection      | [demo](examples/yolo-world) |      ✅      |      ✅      |             ✅             |            ✅            |
+|         [DINOv2](https://github.com/facebookresearch/dinov2)         |   Vision-Self-Supervised   |   [demo](examples/dinov2)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|                [CLIP](https://github.com/openai/CLIP)                |       Vision-Language       |    [demo](examples/clip)    |      ✅      |      ✅      | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
+|              [BLIP](https://github.com/salesforce/BLIP)              |       Vision-Language       |    [demo](examples/blip)    |      ✅      |      ✅      | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
+|                [DB](https://arxiv.org/abs/1911.08947)                |       Text Detection       |     [demo](examples/db)     |      ✅      |      ✅      |             ✅             |            ✅            |
+|               [SVTR](https://arxiv.org/abs/2205.00159)               |      Text Recognition      |    [demo](examples/svtr)    |      ✅      |      ✅      |             ✅             |            ✅            |
+| [RTMO](https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo) |     Keypoint Detection     |    [demo](examples/rtmo)    |      ✅      |      ✅      |             ❌             |            ❌            |
+|             [YOLOPv2](https://arxiv.org/abs/2208.11434)             | Panoptic driving Perception |   [demo](examples/yolop)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|    [YOLOv5-classification](https://github.com/ultralytics/yolov5)    |      Object Detection      |   [demo](examples/yolov5)   |      ✅      |      ✅      |             ✅             |            ✅            |
+|     [YOLOv5-segmentation](https://github.com/ultralytics/yolov5)     |    Instance Segmentation    |   [demo](examples/yolov5)   |      ✅      |      ✅      |             ✅             |            ✅            |
 
 ## Solution Models
 
-Additionally, this repo also provides some solution models.
 
-|                             Model                             |             Example             |             Result             |
-| :------------------------------------------------------------: | :------------------------------: | :------------------------------: |
-|                Lane Line Segmentation<br /> Drivable Area Segmentation<br />Car Detection<br />车道线-可行驶区域-车辆检测                | [demo](examples/yolov8-plastic-bag) |<img src='examples/yolop/demo.png'  width="220px" height="140px">|
-|                  Face Parsing<br />  人脸解析                  |    [demo](examples/face-parsing)    |<img src='examples/face-parsing/demo.png' width="220px" height="200px"> |
-|    Text Detection<br />(PPOCR-det v3, v4)<br />通用文本检测    |         [demo](examples/db)         |<img src='examples/db/demo.jpg'  width="250px" height="200px">|
-| Text Recognition<br />(PPOCR-rec v3, v4)<br />中英文-文本识别 |        [demo](examples/svtr)        ||
-|         Face-Landmark Detection<br />人脸 & 关键点检测         |    [demo](examples/yolov8-face)    |<img src='examples/yolov8-face/demo.jpg'  width="220px" height="180px">|
-|                 Head Detection<br />  人头检测                 |    [demo](examples/yolov8-head)    |<img src='examples/yolov8-head/demo.jpg'  width="220px" height="180px">|
-|                 Fall Detection<br />  摔倒检测                 |  [demo](examples/yolov8-falldown)  |  <img src='examples/yolov8-falldown/demo.jpg'  width="220px" height="180px">|
-|                Trash Detection<br />  垃圾检测                | [demo](examples/yolov8-plastic-bag) |<img src='examples/yolov8-trash/demo.jpg'  width="250px" height="180px">|
+<details close>
+<summary>Additionally, this repo also provides some solution models.</summary>
+
+|                                                    Model                                                    |             Example             |                                     Result                                     |
+| :---------------------------------------------------------------------------------------------------------: | :------------------------------: | :-----------------------------------------------------------------------------: |
+| Lane Line Segmentation<br /> Drivable Area Segmentation<br />Car Detection<br />车道线-可行驶区域-车辆检测 | [demo](examples/yolov8-plastic-bag) |      <img src='examples/yolop/demo.png'  width="220px" height="140px">      |
+|                                        Face Parsing<br />  人脸解析                                        |    [demo](examples/face-parsing)    |   <img src='examples/face-parsing/demo.png' width="220px" height="200px">   |
+|                          Text Detection<br />(PPOCR-det v3, v4)<br />通用文本检测                          |         [demo](examples/db)         |       <img src='examples/db/demo.png'  width="250px" height="200px">       |
+|                       Text Recognition<br />(PPOCR-rec v3, v4)<br />中英文-文本识别                       |        [demo](examples/svtr)        |                                                                                |
+|                               Face-Landmark Detection<br />人脸 & 关键点检测                               |    [demo](examples/yolov8-face)    |   <img src='examples/yolov8-face/demo.png'  width="220px" height="180px">   |
+|                                       Head Detection<br />  人头检测                                       |    [demo](examples/yolov8-head)    |   <img src='examples/yolov8-head/demo.png'  width="220px" height="180px">   |
+|                                       Fall Detection<br />  摔倒检测                                       |  [demo](examples/yolov8-falldown)  | <img src='examples/yolov8-falldown/demo.png'  width="220px" height="180px"> |
+|                                       Trash Detection<br />  垃圾检测                                       | [demo](examples/yolov8-plastic-bag) |  <img src='examples/yolov8-trash/demo.png'  width="250px" height="180px">  |
+
+</details>
 
 ## Demo
 
@@ -59,8 +82,9 @@ check **[ort guide](https://ort.pyke.io/setup/linking)**
 
 </details>
 
-
 ## Integrate into your own project
+<details close>
+<summary>Check Here</summary>
 
 #### 1. Add `usls` as a dependency to your project's `Cargo.toml`
 
@@ -126,3 +150,4 @@ let y = model.run(&x)?;
 let annotator = Annotator::default().with_saveout("YOLOv8");
 annotator.annotate(&x, &y);
 ```
+</details>
diff --git a/assets/2.jpg b/assets/2.jpg
diff --git a/assets/dota.png b/assets/dota.png
diff --git a/examples/blip/README.md b/examples/blip/README.md
@@ -17,10 +17,12 @@ cargo run -r --example blip
 ```shell
 [Unconditional image captioning]: a group of people walking around a bus
 [Conditional image captioning]: three man walking in front of a bus
+Some(["three man walking in front of a bus"])
 ```
 
 ## TODO
 
+* [ ] Multi-batch inference for image caption
 * [ ] VQA
 * [ ] Retrival
 * [ ] TensorRT support for textual model
diff --git a/examples/blip/main.rs b/examples/blip/main.rs
@@ -1,4 +1,4 @@
-use usls::{models::Blip, Options};
+use usls::{models::Blip, DataLoader, Options};
 
 fn main() -> Result<(), Box<dyn std::error::Error>> {
     // visual
@@ -22,9 +22,11 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     // build model
     let mut model = Blip::new(options_visual, options_textual)?;
 
-    // image caption
-    model.caption("./assets/bus.jpg", None)?; // unconditional
-    model.caption("./assets/bus.jpg", Some("three man"))?; // conditional
+    // image caption (this demo use batch_size=1)
+    let x = vec![DataLoader::try_read("./assets/bus.jpg")?];
+    let _y = model.caption(&x, None, true)?; // unconditional
+    let y = model.caption(&x, Some("three man"), true)?; // conditional
+    println!("{:?}", y[0].texts());
 
     Ok(())
 }
diff --git a/examples/clip/main.rs b/examples/clip/main.rs
@@ -1,4 +1,4 @@
-use usls::{models::Clip, ops, DataLoader, Options};
+use usls::{models::Clip, DataLoader, Options};
 
 fn main() -> Result<(), Box<dyn std::error::Error>> {
     // visual
@@ -39,7 +39,7 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
         let feats_image = model.encode_images(&images).unwrap();
 
         // use image to query texts
-        let matrix = ops::dot2(&feats_image, &feats_text)?; // [m, n]
+        let matrix = feats_image.dot2(&feats_text)?;
 
         // summary
         for i in 0..paths.len() {

diff --git a/examples/db/README.md b/examples/db/README.md
@@ -20,4 +20,4 @@ cargo run -r --example db
 
 ## Results
 
-![](./demo.jpg)
+![](./demo.png)
diff --git a/examples/db/demo.jpg b/examples/db/demo.jpg
diff --git a/examples/db/demo.png b/examples/db/demo.png
diff --git a/examples/db/main.rs b/examples/db/main.rs
@@ -15,18 +15,21 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     let mut model = DB::new(&options)?;
 
     // load image
-    let x = vec![DataLoader::try_read("./assets/db.png")?];
+    let x = vec![
+        DataLoader::try_read("./assets/db.png")?,
+        // DataLoader::try_read("./assets/2.jpg")?,
+    ];
 
     // run
     let y = model.run(&x)?;
 
     // annotate
     let annotator = Annotator::default()
-        .without_name(true)
-        .without_polygons(false)
-        .with_mask_alpha(0)
-        .without_bboxes(false)
-        .with_saveout("DB-Text-Detection");
+        .without_bboxes(true)
+        .with_masks_alpha(60)
+        .with_polygon_color([255, 105, 180, 255])
+        .without_mbrs(true)
+        .with_saveout("DB");
     annotator.annotate(&x, &y);
 
     Ok(())

diff --git a/examples/face-parsing/demo.png b/examples/face-parsing/demo.png
diff --git a/examples/face-parsing/main.rs b/examples/face-parsing/main.rs
@@ -9,7 +9,6 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
         .with_i03((416, 640, 800).into())
         // .with_trt(0)
         // .with_fp16(true)
-        // .with_dry_run(10)
         .with_confs(&[0.5]);
     let mut model = YOLO::new(&options)?;
 
@@ -21,10 +20,10 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
 
     // annotate
     let annotator = Annotator::default()
-        .without_conf(true)
-        .without_name(true)
-        .without_polygons(false)
         .without_bboxes(true)
+        .without_bboxes_conf(true)
+        .without_bboxes_name(true)
+        .without_polygons(false)
         .with_masks_name(false)
         .with_saveout("Face-Parsing");
     annotator.annotate(&x, &y);

diff --git a/examples/fastsam/README.md b/examples/fastsam/README.md
@@ -20,4 +20,4 @@ cargo run -r --example fastsam
 
 ## Results
 
-![](./demo.jpg)
+![](./demo.png)
diff --git a/examples/fastsam/demo.jpg b/examples/fastsam/demo.jpg
diff --git a/examples/fastsam/demo.png b/examples/fastsam/demo.png
diff --git a/examples/rtdetr/README.md b/examples/rtdetr/README.md
@@ -18,4 +18,4 @@ cargo run -r --example rtdetr
 
 ## Results
 
-![](./demo.jpg)
+![](./demo.png)
diff --git a/examples/rtdetr/demo.jpg b/examples/rtdetr/demo.jpg
diff --git a/examples/rtdetr/demo.png b/examples/rtdetr/demo.png
diff --git a/examples/rtdetr/main.rs b/examples/rtdetr/main.rs
@@ -1,11 +1,11 @@
-use usls::{models::RTDETR, Annotator, DataLoader, Options, COCO_NAMES_80};
+use usls::{coco, models::RTDETR, Annotator, DataLoader, Options};
 
 fn main() -> Result<(), Box<dyn std::error::Error>> {
     // build model
     let options = Options::default()
         .with_model("../models/rtdetr-l-f16.onnx")
         .with_confs(&[0.4, 0.15]) // person: 0.4, others: 0.15
-        .with_names(&COCO_NAMES_80);
+        .with_names(&coco::NAMES_80);
     let mut model = RTDETR::new(&options)?;
 
     // load image

diff --git a/examples/rtmo/README.md b/examples/rtmo/README.md
@@ -15,4 +15,4 @@ cargo run -r --example rtmo
 
 ## Results
 
-![](./demo.jpg)
+![](./demo.png)
diff --git a/examples/rtmo/demo.jpg b/examples/rtmo/demo.jpg
diff --git a/examples/rtmo/demo.png b/examples/rtmo/demo.png
Original file line number	Diff line number	Diff line change
Expand Up		@@ -20,4 +20,4 @@ cargo run -r --example db

		## Results

		![](./demo.jpg)
		![](./demo.png)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -18,4 +18,4 @@ cargo run -r --example rtdetr

		## Results

		![](./demo.jpg)
		![](./demo.png)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,4 +15,4 @@ cargo run -r --example rtmo

		## Results

		![](./demo.jpg)
		![](./demo.png)