YOLOv5-Lite：lighter, faster and easier to deploy

Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, and fewer parameters) and faster (add shuffle channel, yolov5 head for channel reduce. It can infer at least 10+ FPS On the Raspberry Pi 4B when input the frame with 320×320) and is easier to deploy (removing the Focus layer and four slice operations, reducing the model quantization accuracy to an acceptable range).

Comparison of ablation experiment results

ID	Model	Input_size	Flops	Params	Size（M）	Map@0.5	Map@.5:0.95
001	yolo-fastest	320×320	0.25G	0.35M	1.4	24.4	-
002	nanodet-m	320×320	0.72G	0.95M	1.8	-	20.6
003	yolo-fastest-xl	320×320	0.72G	0.92M	3.5	34.3	-
004	YOLOv5-Lite_s^ours	320×320	1.43G	1.62M	3.3	36.2	20.8
005	yolov3-tiny	416×416	6.96G	6.06M	23.0	33.1	16.6
006	yolov4-tiny	416×416	5.62G	8.86M	33.7	40.2	21.7
007	YOLOv5-Lite_s^ours	416×416	2.56G	1.62M	3.3	41.3	24.4
008	YOLOv5-Lite_c^ours	640×640	8.6G	4.37M	9.2	52.5	33.0
009	YOLOv5s	640×640	17.0G	7.3M	14.2	55.8	35.9
010	YOLOv5-Lite_g^ours	640×640	15.7G	5.3M	10.9	56.9	38.1

Comparison on different platforms

Equipment	Computing backend	System	Input	Framework	v5Lite-s	v5Lite-c	v5Lite-g	YOLOv5s
Inter	@i5-10210U	window(x86)	640×640	openvino	-	46ms	-	131ms
Nvidia	@RTX 2080Ti	Linux(x86)	640×640	torch	-	-	15ms	14ms
Redmi K30	@Snapdragon 730G	Android(arm64)	320×320	ncnn	36ms	-	-	263ms
Raspberrypi 4B	@ARM Cortex-A72	Linux(arm64)	320×320	ncnn	97ms	-	-	371ms
Raspberrypi 4B	@ARM Cortex-A72	Linux(arm64)	320×320	mnn	88ms	-	-	356ms

The above is a 4-thread test benchmark
Raspberrypi 4B enable bf16s optimization，Raspberrypi 64 Bit OS

·Model Zoo·

@YOLOv5-Lites:

Model	Size	Backbone	Head	Framework	Design for
v5Lite-s.pt	3.3m	shufflenetv2（Megvii）	v5Lites-head	Pytorch	Arm-cpu
v5Lite-s.bin v5Lite-s.param	3.3m	shufflenetv2	v5Lites-head	ncnn	Arm-cpu
v5Lite-s-int8.bin v5Lite-s-int8.param	1.7m	shufflenetv2	v5Lites-head	ncnn	Arm-cpu
v5Lite-s.mnn	3.3m	shufflenetv2	v5Lites-head	mnn	Arm-cpu
v5Lite-s-int4.mnn	987k	shufflenetv2	v5Lites-head	mnn	Arm-cpu

@YOLOv5-Litec:

Model	Size	Backbone	Head	Framework	Design for
v5Lite-c.pt	9m	PPLcnet（Baidu）	v5Litec-head	Pytorch	x86-cpu / x86-vpu
v5Lite-c.bin v5Lite-c.xml	8.7m	PPLcnet	v5Litec-head	openvivo	x86-cpu / x86-vpu

@YOLOv5-Liteg:

Model	Size	Backbone	Head	Framework	Design for
v5Lite-g.pt	10.9m	Repvgg（Tsinghua）	v5Liteg-head	Pytorch	x86-gpu / arm-gpu / arm-npu
v5Lite-g-int8.engine	8.5m	Repvgg	v5Liteg-head	Tensorrt	x86-gpu / arm-gpu / arm-npu

How to use

Install

Python>=3.6.0 is required with all requirements.txt installed including PyTorch>=1.7:

$ git clone https://github.com/ppogg/YOLOv5-Lite
$ cd YOLOv5-Lite
$ pip install -r requirements.txt

Inference with detect.py

detect.py runs inference on a variety of sources, downloading models automatically from the latest YOLOv5-Lite release and saving results to runs/detect.

$ python detect.py --source 0  # webcam
                            file.jpg  # image 
                            file.mp4  # video
                            path/  # directory
                            path/*.jpg  # glob
                            'https://youtu.be/NUsoVlDFqZg'  # YouTube
                            'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream

Training

$ python train.py --data coco.yaml --cfg v5lite-s.yaml --weights v5lite-s.pt --batch-size 128
                                         v5lite-c.yaml           v5lite-c.pt               96
                                         v5lite-g.yaml           v5lite-g.pt               64

If you use multi-gpu. It's faster several times:

$ python -m torch.distributed.launch --nproc_per_node 2 train.py

DataSet

Training set and test set distribution （the path with xx.jpg）

train: ../coco/images/train2017/
val: ../coco/images/val2017/

├── images            # xx.jpg example
│   ├── train2017        
│   │   ├── 000001.jpg
│   │   ├── 000002.jpg
│   │   └── 000003.jpg
│   └── val2017         
│       ├── 100001.jpg
│       ├── 100002.jpg
│       └── 100003.jpg
└── labels             # xx.txt example      
    ├── train2017       
    │   ├── 000001.txt
    │   ├── 000002.txt
    │   └── 000003.txt
    └── val2017         
        ├── 100001.txt
        ├── 100002.txt
        └── 100003.txt