Awesome
- 2021.8.12: yolo-fastestV2版本已发布:https://github.com/dog-qiuqiu/Yolo-FastestV2
- 2021.3.21: 对模型结构进行细微调整优化,更新Yolo-Fastest-1.1模型
- 2021.3.19: NCNN Camera Demo https://github.com/dog-qiuqiu/Yolo-Fastest/tree/master/sample/ncnn
- 2021.3.16: 修复分组卷积在某些旧架构GPU推理耗时异常的问题
:zap:Yolo-Fastest:zap:
- Simple, fast, compact, easy to transplant
- A real-time target detection algorithm for all platforms
- The fastest and smallest known universal target detection algorithm based on yolo
- Optimized design for ARM mobile terminal, optimized to support NCNN reasoning framework
- Based on NCNN deployed on RK3399 ,Raspberry Pi 4b... and other embedded devices to achieve full real-time 30fps+
- 中文介绍https://zhuanlan.zhihu.com/p/234506503
- 相比AlexeyAB/darknet,此版本的darknet修复分组卷积在某些旧架构GPU推理耗时异常的问题(例如1050ti:40ms->4ms速度提升10倍),强烈建议用此仓库框架训练模型
- Compared with AlexeyAB/darknet, this version of darknet fixes the problem of abnormal time-consuming inference of grouped convolution in some old architecture GPUs (for example, 1050ti:40ms->4ms speed up 10 times), it is strongly recommended to use this warehouse framework for training model
- Darknet CPU推理效率优化不好,不建议使用Darknet作为CPU端的推理框架,建议使用NCNN
- Darknet CPU reasoning efficiency optimization is not good, it is not recommended to use Darknet as the CPU side reasoning framework, it is recommended to use ncnn
- Based on pytorch training framework: https://github.com/dog-qiuqiu/yolov3
Evaluating indicator/Benchmark
Network | COCO mAP(0.5) | Resolution | Run Time(Ncnn 4xCore) | Run Time(Ncnn 1xCore) | FLOPS | Params | Weight size |
---|---|---|---|---|---|---|---|
Yolo-Fastest-1.1 | 24.40 % | 320X320 | 5.59 ms | 7.52 ms | 0.252BFlops | 0.35M | 1.4M |
Yolo-Fastest-1.1-xl | 34.33 % | 320X320 | 9.27ms | 15.72ms | 0.725BFlops | 0.925M | 3.7M |
Yolov3-Tiny-Prn | 33.1% | 416X416 | %ms | %ms | 3.5BFlops | 4.7M | 18.8M |
Yolov4-Tiny | 40.2% | 416X416 | 23.67ms | 40.14ms | 6.9 BFlops | 5.77M | 23.1M |
- Test platform Mi 11 Snapdragon 888 CPU,Based on NCNN
- COCO 2017 Val mAP(no group label)
- Suitable for hardware with extremely tight computing resources
- This model is recommended to do some simple single object detection suitable for simple application scenarios
Yolo-Fastest-1.1 Multi-platform benchmark
Equipment | Computing backend | System | Framework | Run time |
---|---|---|---|---|
Mi 11 | Snapdragon 888 | Android(arm64) | ncnn | 5.59ms |
Mate 30 | Kirin 990 | Android(arm64) | ncnn | 6.12ms |
Meizu 16 | Snapdragon 845 | Android(arm64) | ncnn | 7.72ms |
Development board | Snapdragon 835(Monkey version) | Android(arm64) | ncnn | 20.52ms |
Development board | RK3399 | Linux(arm64) | ncnn | 35.04ms |
Raspberrypi 3B | 4xCortex-A53 | Linux(arm64) | ncnn | 62.31ms |
Orangepi Zero Lts | H2+ 4xCortex-A7 | Linux(armv7) | ncnn | 550ms |
Nvidia | Gtx 1050ti | Ubuntu(x64) | darknet | 4.73ms |
Intel | i7-8700 | Ubuntu(x64) | ncnn | 5.78ms |
- The above is a multi-core test benchmark
- The above speed benchmark is tested by big core in big.little CPU
- Raspberrypi 3B enable bf16s optimization,Raspberrypi 64 Bit OS
- Rk3399 needs to lock the cpu to the highest frequency, ncnn and enable bf16s optimization
Pascal VOC performance index comparison
Network | Model Size | mAP(VOC 2007) | FLOPS |
---|---|---|---|
Tiny YOLOv2 | 60.5MB | 57.1% | 6.97BFlops |
Tiny YOLOv3 | 33.4MB | 58.4% | 5.52BFlops |
YOLO Nano | 4.0MB | 69.1% | 4.51Bflops |
MobileNetv2-SSD-Lite | 13.8MB | 68.6% | &Bflops |
MobileNetV2-YOLOv3 | 11.52MB | 70.20% | 2.02Bflos |
Pelee-SSD | 21.68MB | 70.09% | 2.40Bflos |
Yolo Fastest | 1.3MB | 61.02% | 0.23Bflops |
Yolo Fastest-XL | 3.5MB | 69.43% | 0.70Bflops |
MobileNetv2-Yolo-Lite | 8.0MB | 73.26% | 1.80Bflops |
- Performance indicators reference from the papers and public indicators in the github project
- MobileNetv2-Yolo-Lite: https://github.com/dog-qiuqiu/MobileNet-Yolo#mobilenetv2-yolov3-litenano-darknet
Yolo-Fastest-1.1 Pedestrian detection
Equipment | System | Framework | Run time |
---|---|---|---|
Raspberrypi 3B | Linux(arm64) | ncnn | 62ms |
- Simple real-time pedestrian detection model based on yolo-fastest-1.1
- Enable bf16s optimization,Raspberrypi 64 Bit OS
Demo
Compile
How to compile on Linux
- This repo is based on Darknet project so the instructions for compiling the project are same (https://github.com/MuhammadAsadJaved/darknet#how-to-compile-on-windows-legacy-way)
Just do make
in the Yolo-Fastest-master directory. Before make, you can set such options in the Makefile
: link
GPU=1
to build with CUDA to accelerate by using GPU (CUDA should be in/usr/local/cuda
)CUDNN=1
to build with cuDNN v5-v7 to accelerate training by using GPU (cuDNN should be in/usr/local/cudnn
)CUDNN_HALF=1
to build for Tensor Cores (on Titan V / Tesla V100 / DGX-2 and later) speedup Detection 3x, Training 2xOPENCV=1
to build with OpenCV 4.x/3.x/2.4.x - allows to detect on video files and video streams from network cameras or web-cams- Set the other options in the
Makefile
according to your need.
Test/Demo
*Run Yolo-Fastest , Yolo-Fastest-xl , Yolov3 or Yolov4 on image or video inputs
Demo on image input
*Note: change .data , .cfg , .weights and input image file in image_yolov3.sh
for Yolo-Fastest-x1, Yolov3 and Yolov4
sh image_yolov3.sh
Demo on video input
*Note: Use any input video and place in the data
folder or use 0
in the video_yolov3.sh
for webcam
*Note: change .data , .cfg , .weights and input video file in video_yolov3.sh
for Yolo-Fastest-x1, Yolov3 and Yolov4
sh video_yolov3.sh
Yolo-Fastest Test
Yolo-Fastest-xl Test
How to Train
Generate a pre-trained model for the initialization of the model backbone
./darknet partial yolo-fastest.cfg yolo-fastest.weights yolo-fastest.conv.109 109
Train
- 交流qq群:1062122604
- https://github.com/AlexeyAB/darknet
./darknet detector train voc.data yolo-fastest.cfg yolo-fastest.conv.109
Deploy
NCNN
NCNN Conversion Tutorial
- Benchmark:https://github.com/Tencent/ncnn/tree/master/benchmark
- NCNN supports direct conversion of darknet models
- darknet2ncnn: https://github.com/Tencent/ncnn/tree/master/tools/darknet
NCNN Sample
- CamSample:https://github.com/dog-qiuqiu/Yolo-Fastest/tree/master/sample/ncnn
- AndroidSample: https://github.com/WZTENG/YOLOv5_NCNN
MNN&TNN&MNN
- https://github.com/dog-qiuqiu/MobileNet-Yolo#darknet2caffe-tutorial
- Based on MNN: https://github.com/geekzhu001/Yolo-Fastest-MNN Run on : raspberry pi 4B 2G Input size : 320320 Average inference time : 0.035s*
ONNX&TensorRT
- https://github.com/CaoWGG/TensorRT-YOLOv4
- It is not efficient to run on Psacal and earlier GPU architectures. It is not recommended to deploy on such devices such as jeston nano(17ms/img), Tx1, Tx2, but there is no such problem in Turing GPU, such as jetson-Xavier-NX Can run efficiently
OpenCV DNN
Thanks
Cite as
dog-qiuqiu. (2021, July 24). dog-qiuqiu/Yolo-Fastest: yolo-fastest-v1.1.0 (Version v.1.1.0). Zenodo. http://doi.org/10.5281/zenodo.5131532