Home

Awesome

VarifocalNet: An IoU-aware Dense Object Detector

This repo hosts the code for implementing the VarifocalNet, as presented in our CVPR 2021 oral paper, which is available at: https://arxiv.org/abs/2008.13367:

@inproceedings{zhang2020varifocalnet,
  title={VarifocalNet: An IoU-aware Dense Object Detector},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  booktitle={CVPR},
  year={2021}
}

Introduction

Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. In this work, we propose to learn IoU-aware classification scores (IACS) that simultaneously represent the object presence confidence and localization accuracy, to produce a more accurate ranking of detections in dense object detectors. In particular, we design a new loss function, named Varifocal Loss (VFL), for training a dense object detector to predict the IACS, and a new efficient star-shaped bounding box feature representation (the features at nine yellow sampling points) for estimating the IACS and refining coarse bounding boxes. Combining these two new components and a bounding box refinement branch, we build a new IoU-aware dense object detector based on the FCOS+ATSS architecture, what we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO benchmark show that our VFNet consistently surpasses the strong baseline by ~2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN reaches a single-model single-scale AP of 55.1 on COCO test-dev, achieving the state-of-the-art performance among various object detectors.

<div align="center"> <img src="VFNet.png" width="600px" /> <p>Learning to Predict the IoU-aware Classification Score.</p> </div>

Updates

Installation

A Quick Demo

Once the installation is done, you can follow the steps below to run a quick demo.

Usage of MMDetection

Please see exist_data_model.md for the basic usage of MMDetection. They also provide colab tutorial for beginners.

For troubleshooting, please refer to faq.md

Results and Models

For your convenience, we provide the following trained models. These models are trained with a mini-batch size of 16 images on 8 Nvidia V100 GPUs (2 images per GPU).

BackboneStyleDCNMS <br> trainLr <br> schdInf time <br> (fps)box AP <br> (val)box AP <br> (test-dev)    Download    
R-50pytorchNN1x19.441.641.6model | log
R-50pytorchNY2x19.344.544.8model | log
R-50pytorchYY2x16.347.848.0model | log
R-101pytorchNN1x15.543.043.6model | log
R-101pytorchNN2x15.643.543.9model | log
R-101pytorchNY2x15.646.246.7model | log
R-101pytorchYY2x12.649.049.2model | log
X-101-32x4dpytorchNY2x13.147.447.6model | log
X-101-32x4dpytorchYY2x10.149.750.0model | log
X-101-64x4dpytorchNY2x9.248.248.5model | log
X-101-64x4dpytorchYY2x6.750.450.8model | log
R2-101pytorchNY2x13.049.249.3model | log
R2-101pytorchYY2x10.351.151.3model | log

Notes:

We also provide the models of RetinaNet, FoveaBox, RepPoints and ATSS trained with the Focal Loss (FL) and our Varifocal Loss (VFL).

MethodBackboneMS trainLr schdbox AP (val)Download
RetinaNet + FLR-50N1x36.5model | log
RetinaNet + VFLR-50N1x37.4model | log
FoveaBox + FLR-50N1x36.3model | log
FoveaBox + VFLR-50N1x37.2model | log
RepPoints + FLR-50N1x38.3model | log
RepPoints + VFLR-50N1x39.7model | log
ATSS + FLR-50N1x39.3model | log
ATSS + VFLR-50N1x40.2model | log

Notes:

VFNet-X

BackboneDCNMS <br> trainTrainingInf <br> scaleInf time <br> (fps)box AP <br> (val)box AP <br> (test-dev)    Download    
R2-101YY41e + SWA 18e1333x8008.053.453.7model | config
R2-101YY41e + SWA 18e1800x12004.254.555.1

Notes:

We implement some improvements to the original VFNet. This version of VFNet is called VFNet-X and these improvements include:

For more detailed information, please see the VFNet-X config file.

Inference

Assuming you have put the COCO dataset into data/coco/ and have downloaded the models into the checkpoints/, you can now evaluate the models on the COCO val2017 split:

./tools/dist_test.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth 8 --eval bbox

Notes:

Training

The following command line will train vfnet_r50_fpn_1x_coco on 8 GPUs:

./tools/dist_train.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py 8

Notes:

Contributing

Any pull requests or issues are welcome.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:

@inproceedings{zhang2020varifocalnet,
  title={VarifocalNet: An IoU-aware Dense Object Detector},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  booktitle={CVPR},
  year={2021}
}

Acknowledgment

We would like to thank MMDetection team for producing this great object detection toolbox!

License

This project is released under the Apache 2.0 license.