Home

Awesome

Cross Stage Partial Networks

This is the implementation of "CSPNet: A New Backbone that can Enhance Learning Capability of CNN" using Darknet framwork.

For installing Darknet framework, you can refer to darknet(AlexeyAB).

Combining with CIoU, Scale Sensitivity, IoU Threshold, Greedy NMS, Mosaic Augmentation, ...

CSPResNeXt-50-PANet-SPP acheives impressive results on test-dev set of MSCOCO object detection task:

ModelSizefpsAPAP50AP75APSAPMAPLcfgweight
CSPResNeXt50-PANet-SPP(SAM)512×512-42.764.646.323.746.155.3--
CSPResNeXt50-PANet-SPP(SAM)608×608-43.265.447.126.146.753.2--
CSPResNeXt50-PANet-SPP(GIoU)512×512-42.464.445.923.345.955.0--
CSPResNeXt50-PANet-SPP(GIoU)608×608-43.165.447.026.046.952.8--
CSPResNeXt50-PANet-SPP512×51244(1080ti) 67(GV100)42.464.445.923.245.555.3cfgweight
CSPResNeXt50-PANet-SPP608×60835(1080ti) 44(GV100)43.265.447.025.746.753.3cfgweight
CSPDarknet53-PANet-SPP512×51251(1080ti)42.464.546.023.945.654.2cfgweight

ImageNet

Big Models

Model#ParameterBFLOPsTop-1Top-5cfgweight
DarkNet-53 [1]41.57M18.5777.293.8cfgweight
CSPDarkNet-5327.61M (-34%)13.07 (-30%)77.2 (=)93.6 (-0.2)cfgweight
CSPDarkNet-53-Elastic-7.74 (-58%)76.1 (-1.1)93.3 (-0.5)cfgweight
ResNet-50 [2]22.73M9.7475.892.9cfgweight
CSPResNet-5021.57M (-5%)8.97 (-8%)76.6 (+0.8)93.3 (+0.4)cfgweight
CSPResNet-50-Elastic-9.36 (-4%)76.8 (+1.0)93.5 (+0.6)cfgweight
ResNeXt-50 [3]22.19M10.1177.894.2cfgweight
CSPResNeXt-5020.50M (-8%)7.93 (-22%)77.9 (+0.1)94.0 (-0.2)cfgweight
CSPResNeXt-50-Elastic-5.45 (-46%)77.2 (-0.6)93.8 (-0.4)cfgweight
CSPResNeXt-50+Elastic-7.82 (-23%)78.2 (+0.4)94.2 (=)--
HarDNet-138s [4]35.5M13.477.8---
DenseNet-264-32 [5]27.21M11.0377.893.9--
ResNet-152 [2]60.2M22.677.893.6--
DenseNet-201+Elastic [6]19.48M8.7777.994.0--
CSPDenseNet-201+Elastic20.17M (+4%)7.13 (-19%)77.9 (=)94.0 (=)--
Res2NetLite-72 [7]-5.1974.792.1cfgweight

Small Models

Model#ParameterBFLOPsTop-1Top-5cfgweight
PeleeNet [8]2.79M1.01770.790.0--
PeleeNet-swish2.79M1.01771.590.7--
PeleeNet-swish-SE2.81M1.01772.191.0--
CSPPeleeNet2.83M (+1%)0.888 (-13%)70.9 (+0.2)90.2 (+0.2)--
CSPPeleeNet-swish2.83M (+1%)0.888 (-13%)71.7 (+0.2)90.8 (+0.1)--
CSPPeleeNet-swish-SE2.85M (+1%)0.888 (-13%)72.4 (+0.3)91.0 (=)--
SparsePeleeNet [9]2.39M0.90469.689.3--
EfficientNet-B0* [10]4.81M0.91571.390.4cfgweight
EfficientNet-B0 (official) [10]--70.088.9--
MobileNet-v2 [11]3.47M0.85867.087.7cfgweight
CSPMobileNet-v22.51M (-28%)0.764 (-11%)67.7 (+0.7)88.3 (+0.6)cfgweight
Darknet Ref. [12]7.31M0.9661.183.0cfgweight
CSPDenseNet Ref.3.48M (-52%)0.886 (-8%)65.7 (+4.6)86.6 (+3.6)--
CSPPeleeNet Ref.4.10M (-44%)1.103 (+15%)68.9 (+7.8)88.7 (+5.7)--
CSPDenseNetb Ref.1.38M (-81%)0.631 (-34%)64.2 (+3.1)85.5 (+2.5)--
CSPPeleeNetb Ref.2.01M (-73%)0.897 (-7%)67.8 (+6.7)88.1 (+5.1)--
ResNet-10 [2]5.24M2.27363.585.0cfgweight
CSPResNet-102.73M (-48%)1.905 (-16%)65.3 (+1.8)86.5 (+1.5)--
MixNet-M-GPU-1.06571.590.5--

※EfficientNet* is implemented by Darknet framework.

※EfficientNet(official) is trained by official code with batch size equals to 256.

※Swish activation function is presented by [13].

※Squeeze-and-excitation (SE) network is presented by [14].

※MixNet-M-GPU is modified from MixNet-M [21]

Some tricks for improving Acc

  1. Activation function
ModelActivationTop-1Top-5
PeleeNetLReLU70.790.0
PeleeNetSwish71.5 (+0.8)90.7 (+0.7)
PeleeNetMish71.4 (+0.7)90.4 (+0.4)
CSPPeleeNetLReLU70.990.2
CSPPeleeNetSwish71.7 (+0.8)90.8 (+0.6)
CSPPeleeNetMish71.2 (+0.3)90.3 (+0.1)
CSPResNeXt-50LReLU77.994.0
CSPResNeXt-50Mish78.9 (+1.0)94.5 (+0.5)
<!-- | **CSPResNeXt-50** | Swish | 64.5 **(-13.4)** | 86.0 **(-8.0)** | -->

※Swish activation function is not suitable for ResNeXt-based models, details are shown in Mish paper [22].

  1. Data augmentation
ModelAugmentationTop-1Top-5
CSPResNeXt-50Normal77.994.0
CSPResNeXt-50Mixup77.294.0
CSPResNeXt-50Cutmix78.094.3
CSPResNeXt-50Cutmix+Mixup77.794.4
CSPResNeXt-50Mosaic78.194.5
CSPResNeXt-50Blur77.593.8

※Mixup is presented by [23] and used by [24].

※CutMix is presented by [25].

Have to check the implementation of mixup and cutmix.

  1. Other
ModelMethodTop-1Top-5
CSPResNeXt-50Normal77.994.0
CSPResNeXt-50Smooth78.194.4

※Smooth means label smoothing, which is presented by [26].

MS COCO

GPU Real-time Models

ModelSize1080ti fpsAPAP50AP75cfgweight
CSPResNeXt50-PANet-SPP512×5124438.060.040.8cfgweight
CSPDarknet53-PANet-SPP512×5125138.761.341.7cfgweight
CSPResNet50-PANet-SPP512×5125538.060.540.7cfgweight

※PANet is presented by [15].

※SPP is presented by [16].

CPU Real-time Models

ModelSize9900K fpsAPAP50AP75cfgweight
YOLOv3-tiny [1]416×41654-33.1-cfgweight
YOLOv3-tiny-PRN [18]416×41671-33.1-cfgweight
SNet49-ThunderNet* [19]320×3204719.133.719.6--
Ours320×32010215.334.212.0--
SNet146-ThunderNet* [19]320×3203223.640.224.5--
Ours320×3205219.440.017.0--
Pelee** [7]304×304722.438.322.9--
RefineDetLite** [20]320×320826.846.627.4--

※SNet49-ThunderNet* and SNet146-ThunderNet* are test on Xeon E5-2682v4.

※Pelee** and RefineDetLite** are test on i7-6700.

Some tricks for improving AP

  1. NMS threshold
ModelSizeThresholdAPAP50AP75APSAPMAPL
CSPResNeXt50-PANet-SPP512×5120.4538.060.040.819.741.449.9
CSPResNeXt50-PANet-SPP512×5120.5038.260.241.119.841.650.1
CSPResNeXt50-PANet-SPP512×5120.5538.460.141.320.041.750.3
CSPResNeXt50-PANet-SPP512×5120.6038.560.041.720.141.950.4
CSPResNeXt50-PANet-SPP512×5120.6538.659.742.120.141.950.4
CSPResNeXt50-PANet-SPP512×5120.7038.559.242.420.141.950.4
CSPResNeXt50-PANet-SPP-GIoU512×5120.4539.459.442.520.442.651.4
CSPResNeXt50-PANet-SPP-GIoU512×5120.5039.759.542.720.542.551.7
CSPResNeXt50-PANet-SPP-GIoU512×5120.5539.859.543.020.743.151.9
CSPResNeXt50-PANet-SPP-GIoU512×5120.6040.059.343.420.843.252.0
CSPResNeXt50-PANet-SPP-GIoU512×5120.6540.159.043.820.943.452.1
CSPResNeXt50-PANet-SPP-GIoU512×5120.7040.158.644.220.943.452.1
CSPResNeXt50-PANet-SPP-GIoU512×512aware40.059.543.420.843.252.0

※GIoU is presented by [17].

  1. Activation function
ModelSizeActivationAPAP50AP75APSAPMAPL
CSPPeleeNet-PRN416×416Leaky ReLU23.144.522.06.624.435.3
CSPPeleeNet-PRN416×416Swish24.145.823.36.826.135.5
  1. Loss function
ModelSizeLossAPAP50AP75APSAPMAPL
CSPResNeXt50-PANet-SPP512×512MSE38.060.040.819.741.449.9
CSPResNeXt50-PANet-SPP512×512GIoU39.459.442.520.442.651.4
CSPResNeXt50-PANet-SPP512×512DIoU39.158.842.120.142.450.7
CSPResNeXt50-PANet-SPP512×512CIoU39.659.242.620.542.951.6

※DIoU and CIoU are presented by [27].

Citation

@inproceedings{wang2020cspnet,
  title={CSPNet: A new backbone that can enhance learning capability of cnn},
  author={Wang, Chien-Yao and Mark Liao, Hong-Yuan and Wu, Yueh-Hua and Chen, Ping-Yang and Hsieh, Jun-Wei and Yeh, I-Hau},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  pages={390--391},
  year={2020}
}

Reference

[1] YOLOv3: An Incremental Improvement

[2] Deep Residual Learning for Image Recognition (CVPR 2016)

[3] Aggregated Residual Transformations for Deep Neural Networks (CVPR 2017)

[4] HarDNet: A Low Memory Traffic Network (ICCV 2019)

[5] Densely Connected Convolutional Networks (CVPR 2017)

[6] ELASTIC: Improving CNNs with Dynamic Scaling Policies (CVPR 2019)

[7] RefineDetLite: A Lightweight One-stage Object Detection Framework for CPU-only Devices

[8] Pelee: A Real-Time Object Detection System on Mobile Devices (NeurIPS 2018)

[9] Sparsely Aggregated Convolutional Networks (ECCV 2018)

[10] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (ICML 2019)

[11] MobileNetV2: Inverted Residuals and Linear Bottlenecks (CVPR 2018)

[12] https://pjreddie.com/darknet/tiny-darknet/

[13] Searching for Activation Functions

[14] Squeeze-and-Excitation Networks (CVPR 2018)

[15] Path Aggregation Network for Instance Segmentation (CVPR 2018)

[16] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (TPAMI 2015)

[17] Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression (CVPR 2019)

[18] Enriching Variety of Layer-wise Learning Information by Gradient Combination (ICCVW 2019)

[19] ThunderNet: Towards Real-time Generic Object Detection (ICCV 2019)

[20] RefineDetLite: A Lightweight One-stage Object Detection Framework for CPU-only Devices

[21] MixConv: Mixed Depthwise Convolutional Kernels

[22] Mish: A Self Regularized Non-Monotonic Neural Activation Function

[23] mixup: Beyond Empirical Risk Minimization (ICLR 2018)

[24] Bag of Freebies for Training Object Detection Neural Networks

[25] CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (ICCV 2019)

[26] Rethinking the Inception Architecture for Computer Vision (CVPR 2016)

[27] Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (AAAI 2020)

Acknowledgements

https://github.com/AlexeyAB/darknet

https://github.com/ultralytics/yolov3