Home

Awesome

GFR-DSOD

We have also merged our GFR-DSOD code here into the DSOD repo at https://github.com/szq0214/DSOD. Check it out there if you interested in DSOD.

We also see some very promising results on the PASCAL VOC Comp3 Leaderboard, like https://github.com/kuangliu/torchcv. While we found they still used the ImageNet pre-trained models as the initialized parameters (https://github.com/kuangliu/torchcv/issues/11). Please note that the Comp3 Challenge only allows to use the VOC12 dataset for training (without the pre-trained models). Please check your training process carefully.

If you find this helps your research, please consider citing:

@inproceedings{shenimproving,
     title={Improving Object Detection from Scratch via Gated Feature Reuse},
     author={Shen, Zhiqiang and Shi, Honghui and Yu, Jiahui and Phan, Hai and Feris, Rogerio and Cao, Liangliang and Liu, Ding and Wang, Xinchao and Huang, Thomas and Savvides, Marios}
     booktitle={The British Machine Vision Conference (BMVC)},
     year={2019}

Introduction

In GFR-DSOD, we propose a recurrent feature-pyramid structure to squeeze rich spatial and semantic features into a single prediction layer that further reduces the number of parameters to learn (DSOD need learn 1/2, but GFR-DSOD need only 1/3). Thus our new model is more fit for learning from scratch, and can converge faster than DSOD. We also introduce a novel gate-controlled prediction strategy in GFR-DSOD to adaptively enhance or attenuate feature activations at different scales based on the input object size.

<div align=center> <img src="https://user-images.githubusercontent.com/3794909/36568688-8d176d42-17f0-11e8-85d6-054d90ed5bfc.jpg" width="580"> </div> <div align=center> Figure 1: Illustration of the motivation of GFR-DSOD. </div> <div align=center> <img src="https://user-images.githubusercontent.com/3794909/36566300-ad4d9c2e-17e8-11e8-9808-a4c3602d21b1.jpg" width="740"> </div> <div align=center> Figure 2: An overview of GFR-DSOD together with three one-stage detector methods. </div>

Visualization

  1. Visualizations of network structures (tools from ethereon, please ignore the warning messages):

Results & Models

Our PASCAL VOC LMDB files:

MethodLMDBs
Train on VOC07+12 and test on VOC07Download
Train on VOC07++12 and test on VOC12 (Comp4)Download
Train on VOC12 and test on VOC12 (Comp3)Download

The tables below show the results on PASCAL VOC 2007, 2012 and 2012 Comp3 (training on VOC 2012 only).

PASCAL VOC test results:

MethodVOC 2007 test mAP# paramsModels
GFR-DSOD300 (07+12)78.514.1MDownload (56.5M)
GFR-DSOD320 (07+12)78.714.2MDownload (56.8M)
GFR-DSOD320* (07+12)79.016.0MDownload (63.9M)
MethodVOC 2012 test mAP# paramsModels
GFR-DSOD320* (12)72.5 (VOC Comp3)16.0MDownload (63.9M)
GFR-DSOD320 (07++12)77.014.2MDownload (56.8M)
GFR-DSOD320* (07++12)------

Contact

Zhiqiang Shen (zhiqiangshen0214 at gmail.com)

Any comments or suggestions are welcome!