English | 简体中文
A new dataset to facilitate the task of object detection in the hospital. The proposed hospital indoor object detection (HIOD) dataset has 4,417 annotated images covering 56 object categories with 51,869 annotated object instances. The dataset is characterized by dense annotation, with an average of 11.7 objects and 6.8 object categories per image. Furthermore, a benchmark containing eight state-of-the-art object detectors is created on the dataset. The benchmark indicates that the network trained on the HIOD dataset can accurately detect and classify objects in hospitals. It is believed that the dataset and benchmark provide valuable resources for researchers and practitioners to develop computer-vision-based applications in hospitals.
For further description, please refer to this paper: Object detection in hospital facilities: A comprehensive dataset and performance evaluation
The preparation of the HIOD dataset consists of four steps: object category selection, image collection, image selection, and image annotation.
Image data were related to hospital indoor environments, such as “intensive care unit,” “operating room,” “hospital consulting room,” “hospital tour,” “hospital waiting room,” and more.
Anyone who wants to access the data folder could redirect to https://docs.google.com/forms/d/e/1FAIpQLSfI3UKkkIjvH1RGrN4BbCXCHLyRrtKt-jkJkMduw4K7ZXDNuA/viewform?usp=sf_link We Do Not Accept Any Commercial Use. So please leave your organizational email :).
The image folders contain the original jpeg files. The labels folers contains xml file of the object labels. Download both and combine them in one of the labeling softwares and then ready to be used.
Label | Number of objects | Number of images |
---|---|---|
sofa | 1306 | 687 |
chair | 4777 | 1812 |
foot board | 606 | 490 |
overbed table | 554 | 457 |
hospital bed | 1581 | 1218 |
staff | 1852 | 965 |
door handle | 1659 | 1024 |
table | 1273 | 796 |
bedside monitor | 2539 | 1353 |
iv pole | 1912 | 1166 |
surgical light | 1138 | 641 |
breathing tube | 448 | 406 |
wheel chair | 60 | 56 |
patient | 570 | 527 |
drawer | 303 | 258 |
mouse | 543 | 462 |
computer | 2160 | 1165 |
bedrail | 1787 | 813 |
curtain | 1309 | 778 |
keyboard | 730 | 616 |
infusion pump | 377 | 281 |
ventilator | 327 | 294 |
utility cart | 877 | 598 |
panda baby warmer | 94 | 91 |
visitor | 196 | 152 |
dispenser | 2819 | 1474 |
medical drawer | 814 | 599 |
handle | 5951 | 1382 |
countertop | 1463 | 1040 |
cabinet | 910 | 716 |
waste_bin | 1233 | 884 |
faucet | 717 | 567 |
TV | 747 | 597 |
telephone | 526 | 471 |
syringe pump | 680 | 164 |
light switch | 300 | 265 |
elevator panel | 73 | 59 |
counter | 933 | 737 |
medical waste container | 218 | 186 |
push latch | 330 | 237 |
operating bed | 549 | 523 |
electrosurgical unit | 182 | 158 |
sink | 630 | 564 |
restroom assist bar | 561 | 211 |
incubator | 101 | 95 |
exam table | 221 | 204 |
bedside table | 290 | 249 |
hallway assist bar | 1614 | 550 |
sequential compression | 71 | 70 |
toilet | 191 | 180 |
toilet handle | 114 | 110 |
person | 398 | 216 |
press to open | 53 | 48 |
surgical instrument | 87 | 58 |
xray machine | 58 | 57 |
xray bed | 57 | 51 |
The image shows the statistics and comparison with COCO, VOC, and OpenImages. The HIOD is found to be denser and more diverse than COCO, VOC, and OpenImages with regard to the number of objects and categories on an image. Quantitatively, our HIOD dataset has an average and median of 11.7 and 10 objects on an image, respectively. In comparison, the average number of objects per image for COCO, VOC, and OpenImages are 7.3, 8.2, and 2.7, respectively. The median number of objects per image for COCO, VOC, and OpenImages are 4, 2, and 4, respectively. On the other hand, our HIOD dataset contains an average and median of 6.8 and 6 object categories per image, respectively, which are both significantly greater than other benchmarks. The dense instances and diverse categories in the HIOD dataset lay a solid foundation to build a robust object detector in hospitals.
Eight state-of-the-art object detectors are trained on this dataset, which is randomly partitioned into a training set (70%), a validation set (10%), and a testing set (20%). The performance of these algorithms achieved on the testing set is summarized in Table below.
Algorithm | mAP | |||||
---|---|---|---|---|---|---|
One-stage | ||||||
YOLOv5-L | 0.473 | 0.696 | 0.501 | 0.193 | 0.402 | 0.520 |
YOLOX-L | 0.484 | 0.708 | 0.520 | 0.178 | 0.407 | 0.554 |
YOLOv6-L | 0.517 | 0.737 | 0.553 | 0.211 | 0.418 | 0.597 |
YOLOv7 | 0.506 | 0.738 | 0.545 | 0.201 | 0.432 | 0.546 |
Two-stage | ||||||
Faster R-CNN | 0.403 | 0.646 | 0.438 | 0.141 | 0.327 | 0.466 |
Deformable DETR | 0.490 | 0.741 | 0.529 | 0.209 | 0.406 | 0.565 |
VFNet | 0.495 | 0.711 | 0.538 | 0.200 | 0.420 | 0.567 |
DyHead | 0.430 | 0.645 | 0.468 | 0.145 | 0.345 | 0.507 |
The confusion matrices for YOLOv6-L and VFNet are shown in below figure. The diagonal value represents the recall for the corresponding object category.
Some examples of detection results by YOLOv6-L network.
Install
git clone https://github.com/Wangmmstar/Hospital_Scene_Data.git
pip install -r requirements.txt
Finetune on custom data
Single GPU
# P5 models
python tools/train.py --batch 32 --conf configs/yolov6s_finetune.py --data data/hiod.yaml --fuse_ab --device 0
# P6 models
python tools/train.py --batch 32 --conf configs/yolov6s6_finetune.py --data data/hiod.yaml --img 1280 --device 0
Multi GPUs (DDP mode recommended)
# P5 models
python -m torch.distributed.launch --nproc_per_node 8 tools/train.py --batch 256 --conf configs/yolov6s_finetune.py --data data/hiod.yaml --fuse_ab --device 0,1,2,3,4,5,6,7
# P6 models
python -m torch.distributed.launch --nproc_per_node 8 tools/train.py --batch 128 --conf configs/yolov6s6_finetune.py --data data/hiod.yaml --img 1280 --device 0,1,2,3,4,5,6,7
- fuse_ab: add anchor-based auxiliary branch and use Anchor Aided Training Mode (Not supported on P6 models currently)
- conf: select config file to specify network/optimizer/hyperparameters. We recommend to apply yolov6n/s/m/l_finetune.py when training on your custom dataset.
- data: prepare dataset and specify dataset paths in hiod.yaml
- make sure your dataset structure as follows:
├── dataset
│ ├── hiod
│ ├── images
│ │ ├── train
│ │ └── val
│ ├── labels
│ │ ├── train
│ │ ├── val
YOLOv6 supports different input resolution modes. For details, see How to Set the Input Size.
Resume training
If your training process is corrupted, you can resume training by
# single GPU training.
python tools/train.py --resume
# multi GPU training.
python -m torch.distributed.launch --nproc_per_node 8 tools/train.py --resume
Above command will automatically find the latest checkpoint in YOLOv6 directory, then resume the training process.
Your can also specify a checkpoint path to --resume
parameter by
# remember to replace /path/to/your/checkpoint/path to the checkpoint path which you want to resume training.
--resume /path/to/your/checkpoint/path
This will resume from the specific checkpoint you provide.
Evaluation
Reproduce mAP on HIOD dataset with 640×640 or 1280x1280 resolution
# P5 models
python tools/eval.py --data data/hiod.yaml --batch 32 --weights yolov6s.pt --task val --reproduce_640_eval
# P6 models
python tools/eval.py --data data/hiod.yaml --batch 32 --weights yolov6s6.pt --task val --reproduce_640_eval --img 1280
- verbose: set True to print mAP of each classes.
- do_coco_metric: set True / False to enable / disable pycocotools evaluation method.
- do_pr_metric: set True / False to print or not to print the precision and recall metrics.
- config-file: specify a config file to define all the eval params, for example: yolov6n_with_eval_params.py
Inference
First, download a pretrained model from the YOLOv6 release or use your trained model to do inference.
Second, run inference with tools/infer.py
# P5 models
python tools/infer.py --weights yolov6s.pt --source img.jpg / imgdir / video.mp4
# P6 models
python tools/infer.py --weights yolov6s6.pt --img 1280 1280 --source img.jpg / imgdir / video.mp4
If you want to inference on local camera or web camera, you can run:
# P5 models
python tools/infer.py --weights yolov6s.pt --webcam --webcam-addr 0
# P6 models
python tools/infer.py --weights yolov6s6.pt --img 1280 1280 --webcam --webcam-addr 0
webcam-addr
can be local camera number id or rtsp address.
-
Tutorial: How to train YOLOv6 on a custom dataset
-
YouTube Tutorial: How to train YOLOv6 on a custom dataset
-
Blog post: YOLOv6 Object Detection – Paper Explanation and Inference