Home

Awesome

YOLOv5-Lite:Lighter, faster and easier to deploy

论文插图

Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, and fewer parameters) and faster (add shuffle channel, yolov5 head for channel reduce. It can infer at least 10+ FPS On the Raspberry Pi 4B when input the frame with 320×320) and is easier to deploy (removing the Focus layer and four slice operations, reducing the model quantization accuracy to an acceptable range).

image

Comparison of ablation experiment results

IDModelInput_sizeFlopsParamsSize(M)Map@0.5Map@.5:0.95
001yolo-fastest320×3200.25G0.35M1.424.4-
002YOLOv5-Lite<sub>e</sub><sup>ours</sup>320×3200.73G0.78M1.735.1-
003NanoDet-m320×3200.72G0.95M1.8-20.6
004yolo-fastest-xl320×3200.72G0.92M3.534.3-
005YOLOX<sub>Nano</sub>416×4161.08G0.91M7.3(fp32)-25.8
006yolov3-tiny416×4166.96G6.06M23.033.116.6
007yolov4-tiny416×4165.62G8.86M33.740.221.7
008YOLOv5-Lite<sub>s</sub><sup>ours</sup>416×4161.66G1.64M3.442.025.2
009YOLOv5-Lite<sub>c</sub><sup>ours</sup>512×5125.92G4.57M9.250.932.5
010NanoDet-EfficientLite2512×5127.12G4.71M18.3-32.6
011YOLOv5s(6.0)640×64016.5G7.23M14.056.037.2
012YOLOv5-Lite<sub>g</sub><sup>ours</sup>640×64015.6G5.39M10.957.639.1

See the wiki: https://github.com/ppogg/YOLOv5-Lite/wiki/Test-the-map-of-models-about-coco

Comparison on different platforms

EquipmentComputing backendSystemInputFrameworkv5lite-ev5lite-sv5lite-cv5lite-gYOLOv5s
Inter@i5-10210Uwindow(x86)640×640openvino--46ms-131ms
Nvidia@RTX 2080TiLinux(x86)640×640torch---15ms14ms
Redmi K30@Snapdragon 730GAndroid(armv8)320×320ncnn27ms38ms--163ms
Xiaomi 10@Snapdragon 865Android(armv8)320×320ncnn10ms14ms--163ms
Raspberrypi 4B@ARM Cortex-A72Linux(arm64)320×320ncnn-84ms--371ms
Raspberrypi 4B@ARM Cortex-A72Linux(arm64)320×320mnn-71ms--356ms
AXera-PiCortex A7@CPU<br />3.6TOPs @NPULinux(arm64)640×640axpi---22ms22ms

The tutorial of 15FPS on Raspberry Pi 4B:

https://zhuanlan.zhihu.com/p/672633849

qq交流群:993965802

入群答案:剪枝 or 蒸馏 or 量化 or 低秩分解(任意其一均可)

·Model Zoo·

@v5lite-e:

ModelSizeBackboneHeadFrameworkDesign for
v5Lite-e.pt1.7mshufflenetv2(Megvii)v5Litee-headPytorchArm-cpu
v5Lite-e.bin<br />v5Lite-e.param1.7mshufflenetv2v5Litee-headncnnArm-cpu
v5Lite-e-int8.bin<br />v5Lite-e-int8.param0.9mshufflenetv2v5Litee-headncnnArm-cpu
v5Lite-e-fp32.mnn3.0mshufflenetv2v5Litee-headmnnArm-cpu
v5Lite-e-fp32.tnnmodel<br />v5Lite-e-fp32.tnnproto2.9mshufflenetv2v5Litee-headtnnarm-cpu
v5Lite-e-320.onnx3.1mshufflenetv2v5Litee-headonnxruntimex86-cpu

@v5lite-s:

ModelSizeBackboneHeadFrameworkDesign for
v5Lite-s.pt3.4mshufflenetv2(Megvii)v5Lites-headPytorchArm-cpu
v5Lite-s.bin<br />v5Lite-s.param3.3mshufflenetv2v5Lites-headncnnArm-cpu
v5Lite-s-int8.bin<br />v5Lite-s-int8.param1.7mshufflenetv2v5Lites-headncnnArm-cpu
v5Lite-s.mnn3.3mshufflenetv2v5Lites-headmnnArm-cpu
v5Lite-s-int4.mnn987kshufflenetv2v5Lites-headmnnArm-cpu
v5Lite-s-fp16.bin<br />v5Lite-s-fp16.xml3.4mshufflenetv2v5Lites-headopenvivox86-cpu
v5Lite-s-fp32.bin<br />v5Lite-s-fp32.xml6.8mshufflenetv2v5Lites-headopenvivox86-cpu
v5Lite-s-fp16.tflite3.3mshufflenetv2v5Lites-headtflitearm-cpu
v5Lite-s-fp32.tflite6.7mshufflenetv2v5Lites-headtflitearm-cpu
v5Lite-s-int8.tflite1.8mshufflenetv2v5Lites-headtflitearm-cpu
v5Lite-s-416.onnx6.4mshufflenetv2v5Lites-headonnxruntimex86-cpu

@v5lite-c:

ModelSizeBackboneHeadFrameworkDesign for
v5Lite-c.pt9mPPLcnet(Baidu)v5s-headPytorchx86-cpu / x86-vpu
v5Lite-c.bin<br />v5Lite-c.xml8.7mPPLcnetv5s-headopenvivox86-cpu / x86-vpu
v5Lite-c-512.onnx18mPPLcnetv5s-headonnxruntimex86-cpu

@v5lite-g:

ModelSizeBackboneHeadFrameworkDesign for
v5Lite-g.pt10.9mRepvgg(Tsinghua)v5Liteg-headPytorchx86-gpu / arm-gpu / arm-npu
v5Lite-g-int8.engine8.5mRepvgg-yolov5v5Liteg-headTensorrtx86-gpu / arm-gpu / arm-npu
v5lite-g-int8.tmfile8.7mRepvgg-yolov5v5Liteg-headTenginearm-npu
v5Lite-g-640.onnx21mRepvgg-yolov5yolov5-headonnxruntimex86-cpu
v5Lite-g-640.joint7.1mRepvgg-yolov5yolov5-headaxpiarm-npu

Download Link:

|──────ncnn-fp16: | Baidu Drive | Google Drive |<br> |──────ncnn-int8: | Baidu Drive | Google Drive |<br> |──────mnn-e_bf16: | Google Drive |<br> |──────mnn-d_bf16: | Google Drive|<br> └──────onnx-fp32: | Baidu Drive | Google Drive |<br>

|──────ncnn-fp16: | Baidu Drive | Google Drive |<br> |──────ncnn-int8: | Baidu Drive | Google Drive |<br> └──────tengine-fp32: | Baidu Drive | Google Drive |<br>

└──────openvino-fp16: | Baidu Drive | Google Drive |<br>

└──────axpi-int8: Google Drive |<br>

Baidu Drive Password: pogg

v5lite-s model: TFLite Float32, Float16, INT8, Dynamic range quantization, ONNX, TFJS, TensorRT, OpenVINO IR FP32/FP16, Myriad Inference Engin Blob, CoreML

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite

Thanks for PINTO0309:https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite

<div>How to use</div>

<details open> <summary>Install</summary>

Python>=3.6.0 is required with all requirements.txt installed including PyTorch>=1.7:

<!-- $ sudo apt update && apt install -y libgl1-mesa-glx libsm6 libxext6 libxrender-dev -->
$ git clone https://github.com/ppogg/YOLOv5-Lite
$ cd YOLOv5-Lite
$ pip install -r requirements.txt
</details> <details> <summary>Inference with detect.py</summary>

detect.py runs inference on a variety of sources, downloading models automatically from the latest YOLOv5-Lite release and saving results to runs/detect.

$ python detect.py --source 0  # webcam
                            file.jpg  # image 
                            file.mp4  # video
                            path/  # directory
                            path/*.jpg  # glob
                            'https://youtu.be/NUsoVlDFqZg'  # YouTube
                            'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream
</details> <details open> <summary>Training</summary>
$ python train.py --data coco.yaml --cfg v5lite-e.yaml --weights v5lite-e.pt --batch-size 128
                                         v5lite-s.yaml           v5lite-s.pt              128
                                         v5lite-c.yaml           v5lite-c.pt               96
                                         v5lite-g.yaml           v5lite-g.pt               64

If you use multi-gpu. It's faster several times:

$ python -m torch.distributed.launch --nproc_per_node 2 train.py
</details> </details> <details open> <summary>DataSet</summary>

Training set and test set distribution (the path with xx.jpg)

train: ../coco/images/train2017/
val: ../coco/images/val2017/
├── images            # xx.jpg example
│   ├── train2017        
│   │   ├── 000001.jpg
│   │   ├── 000002.jpg
│   │   └── 000003.jpg
│   └── val2017         
│       ├── 100001.jpg
│       ├── 100002.jpg
│       └── 100003.jpg
└── labels             # xx.txt example      
    ├── train2017       
    │   ├── 000001.txt
    │   ├── 000002.txt
    │   └── 000003.txt
    └── val2017         
        ├── 100001.txt
        ├── 100002.txt
        └── 100003.txt
</details> <details open> <summary>Auto LabelImg</summary>

Link :https://github.com/ppogg/AutoLabelImg

You can use LabelImg based YOLOv5-5.0 and YOLOv5-Lite to AutoAnnotate, biubiubiu 🚀 🚀 🚀 <img src="https://user-images.githubusercontent.com/82716366/177030174-dc3a5827-2821-4d8c-8d78-babe83c42fbf.JPG" width="950"/><br/>

</details> <details open> <summary>Model Hub</summary>

Here, the original components of YOLOv5 and the reproduced components of YOLOv5-Lite are organized and stored in the model hub

modelhub

<details open> <summary>Heatmap Analysis</summary>
$ python main.py --type all

论文插图2

Updating ...

</details>

How to deploy

ncnn for arm-cpu

mnn for arm-cpu

openvino x86-cpu or x86-vpu

tensorrt(C++) for arm-gpu or arm-npu or x86-gpu

tensorrt(Python) for arm-gpu or arm-npu or x86-gpu

Android for arm-cpu

Android_demo

This is a Redmi phone, the processor is Snapdragon 730G, and yolov5-lite is used for detection. The performance is as follows:

link: https://github.com/ppogg/YOLOv5-Lite/tree/master/android_demo/ncnn-android-v5lite

Android_v5Lite-s: https://drive.google.com/file/d/1CtohY68N2B9XYuqFLiTp-Nd2kuFWgAUR/view?usp=sharing

Android_v5Lite-g: https://drive.google.com/file/d/1FnvkWxxP_aZwhi000xjIuhJ_OhqOUJcj/view?usp=sharing

new android app:[link] https://pan.baidu.com/s/1PRhW4fI1jq8VboPyishcIQ [keyword] pogg

<img src="https://user-images.githubusercontent.com/82716366/149959014-5f027b1c-67b6-47e2-976b-59a7c631b0f2.jpg" width="650"/><br/>

More detailed explanation

Detailed model link:

What is YOLOv5-Lite S/E model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/400545131

What is YOLOv5-Lite C model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/420737659

What is YOLOv5-Lite G model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/410874403

How to deploy on ncnn with fp16 or int8: csdn link (Chinese): https://blog.csdn.net/weixin_45829462/article/details/119787840

How to deploy on mnn with fp16 or int8: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/672633849

How to deploy on onnxruntime: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/476533259(old version)

How to deploy on tensorrt: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/478630138

How to optimize on tensorrt: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/463074494

Reference

https://github.com/ultralytics/yolov5

https://github.com/megvii-model/ShuffleNet-Series

https://github.com/Tencent/ncnn

Citing YOLOv5-Lite

If you use YOLOv5-Lite in your research, please cite our work and give a star ⭐:

 @misc{yolov5lite2021,
  title = {YOLOv5-Lite: Lighter, faster and easier to deploy},
  author = {Xiangrong Chen and Ziman Gong},
  doi = {10.5281/zenodo.5241425}
  year={2021}
}