Home

Awesome

<img src="CIoU.png" width="800px"/> <a href="https://apps.apple.com/app/id1452689527" target="_blank"> <img src="https://user-images.githubusercontent.com/26833433/82944393-f7644d80-9f4f-11ea-8b87-1a5b04f555f1.jpg" width="1000"></a> &nbsp

This repo only focuses on NMS speed improvement based on https://github.com/ultralytics/yolov5.

See non_max_suppression function of utils/general.py for our Cluster-NMS implementation.

Batch mode Cluster-NMS

Torchvision NMS has the fastest speed but fails to run in batch mode.

Batch mode Cluster-NMS is made for this.

Our goal is that when using TTA for getting better performance, NMS no longer becomes a potential time-consuming growth factor.

Some Pretrained Weights

ModelAP<sup>val</sup>AP<sup>test</sup>AP<sub>50</sub>Speed<sub>GPU</sub>FPS<sub>GPU</sub>paramsFLOPS
YOLOv5s37.037.056.22.4ms4167.5M13.2B
YOLOv5m44.344.363.23.4ms29421.8M39.4B
YOLOv5l47.747.766.54.4ms22747.8M88.1B
YOLOv5x49.249.267.76.9ms14589.0M166.4B
YOLOv5x + TTA50.850.868.925.5ms3989.0M354.3B
YOLOv3-SPP45.645.565.24.5ms22263.0M118.0B

For more details, please refer to https://github.com/ultralytics/yolov5.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Evaluation for Batch Mode Weighted Cluster-NMS

Hardware

Evaluation command: python test.py --weights yolov5s.pt --data coco.yaml --img 640 --augment --merge --batch-size 32

YOLOv5s.pt

NMSTTAmax-boxweighted thresholdtime (ms)APAP50AP75APsAPmAPl
Torchvision NMSon--3.2 / 17.938.056.541.220.942.651.7
Merge + Torchvision NMSon-0.653.2 / 18.638.056.541.420.942.751.8
Merge + Torchvision NMSon-0.83.2 / 18.938.156.541.421.042.751.8
Weighted Cluster-NMSon10000.83.2 / 6.638.055.741.620.542.851.9
Weighted Cluster-NMSon15000.653.2 / 10.238.156.141.920.942.751.8
Weighted Cluster-NMSon15000.83.2 / 10.238.356.241.821.143.052.0
Weighted Cluster-NMSon20000.83.2 / 14.538.456.441.921.343.152.1
Torchvision NMSoff--1.5 / 5.436.956.240.021.042.147.4
Merge + Torchvision NMSoff-0.651.3 / 6.736.956.240.220.942.147.4
Merge + Torchvision NMSoff-0.81.3 / 6.737.156.240.321.142.247.6
Weighted Cluster-NMSoff10000.651.3 / 6.536.956.040.220.942.047.3
Weighted Cluster-NMSoff10000.81.3 / 6.537.056.040.321.142.247.5

YOLOv5m.pt

NMSTTAmax-boxweighted thresholdtime (ms)APAP50AP75APsAPmAPl
Torchvision NMSon--6.4 / 10.445.163.249.027.050.260.5
Merge + Torchvision NMSon-0.656.4 / 11.545.063.249.026.950.260.3
Merge + Torchvision NMSon-0.86.4 / 11.545.263.349.127.050.360.5
Weighted Cluster-NMSon10000.656.4 / 6.844.662.349.126.050.060.4
Weighted Cluster-NMSon15000.656.4 / 9.844.962.949.426.650.260.4
Weighted Cluster-NMSon15000.86.4 / 9.845.262.949.426.850.460.5
Torchvision NMSoff--2.7 / 4.544.363.248.227.450.056.4
Merge + Torchvision NMSoff-0.652.7 / 6.144.263.148.427.450.156.2
Merge + Torchvision NMSoff-0.82.7 / 6.144.463.248.627.650.256.4
Weighted Cluster-NMSoff10000.652.7 / 6.144.262.948.527.350.056.3
Weighted Cluster-NMSoff10000.82.7 / 6.144.362.948.527.450.156.4

YOLOv5x.pt python test.py --weights yolov5s.pt --data coco.yaml --img 832 --augment --merge --batch-size 32

NMSTTAmax-boxweighted thresholdtime (ms)APAP50AP75APsAPmAPl
Merge + Torchvision NMSon-0.6531.7 / 10.750.268.555.234.254.964.0
Weighted Cluster-NMSon15000.831.8 / 9.950.368.055.433.955.164.6

Details:

# Run NMS
t = time_synchronized()
output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, max_box=max_box, merge=merge)
output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, max_box=max_box, merge=merge)
output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, max_box=max_box, merge=merge)
t1 += time_synchronized() - t

Conclusion

Related issues

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Citation

DOI

This is the code for our paper:

@Inproceedings{zheng2020diou,
  author    = {Zheng, Zhaohui and Wang, Ping and Liu, Wei and Li, Jinze and Ye, Rongguang and Ren, Dongwei},
  title     = {Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression},
  booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)},
  year      = {2020},
}

@Article{zheng2021ciou,
  author    = {Zheng, Zhaohui and Wang, Ping and Ren, Dongwei and Liu, Wei and Ye, Rongguang and Hu, Qinghua and Zuo, Wangmeng},
  title     = {Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation},
  booktitle = {IEEE Transactions on Cybernetics},
  year      = {2021},
}