Home

Awesome

SWA Object Detection

This project hosts the scripts for training SWA object detectors, as presented in our paper:

@article{zhang2020swa,
  title={SWA Object Detection},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  journal={arXiv preprint arXiv:2012.12645},
  year={2020}
}

The full paper is available at: https://arxiv.org/abs/2012.12645.

Introduction

Do you want to improve 1.0 AP for your object detector without any inference cost and any change to your detector? Let us tell you such a recipe. It is surprisingly simple: train your detector for an extra 12 epochs using cyclical learning rates and then average these 12 checkpoints as your final detection model. This potent recipe is inspired by Stochastic Weights Averaging (SWA), which is proposed in [1] for improving generalization in deep neural networks. We found it also very effective in object detection. In this work, we systematically investigate the effects of applying SWA to object detection as well as instance segmentation. Through extensive experiments, we discover a good policy of performing SWA in object detection, and we consistently achieve ~1.0 AP improvement over various popular detectors on the challenging COCO benchmark. We hope this work will make more researchers in object detection know this technique and help them train better object detectors.

<div align="center"> <img src="swa.png" width="600px" /> <p>SWA Object Detection: averaging multiple detection models leads to a better one.</p> </div>

Updates

Installation

Usage of MMDetection

MMDetection provides colab tutorial, and full guidance for quick run with existing dataset and with new dataset for beginners. There are also tutorials for finetuning models, adding new dataset, designing data pipeline, customizing models, customizing runtime settings and useful tools.

Please refer to FAQ for frequently asked questions.

Instructions

We add a SWA training phase to the object detector training process, implement a SWA hook that helps process averaged models, and write a SWA config for conveniently deploying SWA training in training various detectors. We also provide many config files for reproducing the results in the paper.

By including the SWA config in detector config files and setting related parameters, you can have different SWA training modes.

  1. Two-pahse mode. In this mode, the training will begin with the traditional training phase, and it continues for epochs. After that, SWA training will start, with loading the best model on the validation from the previous training phase (becasue swa_load_from = 'best_bbox_mAP.pth'in the SWA config).

    As shown in swa_vfnet_r50 config, the SWA config is included at line 4 and only the SWA optimizer is reset at line 118 in this script. Note that configuring parameters in local scripts will overwrite those values inherited from the SWA config.

    You can change those parameters that are included in the SWA config to use different optimizers or different learning rate schedules for the SWA training. For example, to use a different initial learning rate, say 0.02, you just need to set swa_optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) in the SWA config (global effect) or in the swa_vfnet_r50 config (local effect).

    To start the training, run:

    ./tools/dist_train.sh configs/swa/swa_vfnet_r50_fpn_1x_coco.py 8
    
    
  2. Only-SWA mode. In this mode, the traditional training is skipped and only the SWA training is performed. In general, this mode should work with a pre-trained detection model which you can download from the MMDetection model zoo.

    Have a look at the swa_mask_rcnn_r101 config. By setting only_swa_training = True and swa_load_from = mask_rcnn_pretraind_model, this script conducts only SWA training, starting from a pre-trained detection model. To start the training, run:

    ./tools/dist_train.sh configs/swa/swa_mask_rcnn_r101_fpn_2x_coco.py 8
    
    

In both modes, we have implemented the validation stage and saving functions for the SWA model. Thus, it would be easy to monitor the performance and select the best SWA model.

Results and Models

For your convenience, we provide the following SWA models. These models are obtained by averaging checkpoints that are trained with cyclical learning rates for 12 epochs.

Modelbbox AP (val)segm AP (val)    Download    
SWA-MaskRCNN-R50-1x-0.02-0.0002-38.2-34.739.1, +0.935.5, +0.8model | config
SWA-MaskRCNN-R101-1x-0.02-0.0002-40.0-36.141.0, +1.037.0, +0.9model | config
SWA-MaskRCNN-R101-2x-0.02-0.0002-40.8-36.641.7, +0.937.4, +0.8model | config
SWA-FasterRCNN-R50-1x-0.02-0.0002-37.438.4, +1.0-model | config
SWA-FasterRCNN-R101-1x-0.02-0.0002-39.440.3, +0.9-model | config
SWA-FasterRCNN-R101-2x-0.02-0.0002-39.840.7, +0.9-model | config
SWA-RetinaNet-R50-1x-0.01-0.0001-36.537.8, +1.3-model | config
SWA-RetinaNet-R101-1x-0.01-0.0001-38.539.7, +1.2-model | config
SWA-RetinaNet-R101-2x-0.01-0.0001-38.940.0, +1.1-model | config
SWA-FCOS-R50-1x-0.01-0.0001-36.638.0, +1.4-model | config
SWA-FCOS-R101-1x-0.01-0.0001-39.240.3, +1.1-model | config
SWA-FCOS-R101-2x-0.01-0.0001-39.140.2, +1.1-model | config
SWA-YOLOv3(320)-D53-273e-0.001-0.00001-27.928.7, +0.8-model | config
SWA-YOLOv3(680)-D53-273e-0.001-0.00001-33.434.2, +0.8-model | config
SWA-VFNet-R50-1x-0.01-0.0001-41.642.8, +1.2-model | config
SWA-VFNet-R101-1x-0.01-0.0001-43.044.3, +1.3-model | config
SWA-VFNet-R101-2x-0.01-0.0001-43.544.5, +1.0-model | config

Notes:

Contributing

Any pull requests or issues are welcome.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:

@article{zhang2020swa,
  title={SWA Object Detection},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  journal={arXiv preprint arXiv:2012.12645},
  year={2020}
}

Acknowledgment

Many thanks to Dr Marlies Hankel and MASSIVE HPC for supporting precious GPU computation resources!

We also would like to thank MMDetection team for producing this great object detection toolbox.

License

This project is released under the Apache 2.0 license.

References

[1] Averaging Weights Leads to Wider Optima and Better Generalization; Pavel Izmailov, Dmitry Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson; Uncertainty in Artificial Intelligence (UAI), 2018