Home

Awesome

Deformable Convolutional Networks

Update

[04/15/2019] The PyTorch version of deformable convolution operators are available in the mmdetection codebase. They are very efficient!

[12/01/2018] We updated the deformable convolution operator to be the same as those utilized in the Deformale ConvNets v2 paper. A possible issue when the sampling location is outside of image boundary is solved. The issue may cause deteriated performance on ImageNet classification. Note that the current deformable conv layers in both the official MXNet and the PyTorch codebase still have the issue. So if you want to reproduce the results in Deformable ConvNets v2, please utilize the updated layer provided here. The efficiency at large image batch size is also improved. See more details in DCNv2_op/README.md.

[10/2017] We released the training/testing code and pre-trained models of Deformable FPN, which is the foundation of our COCO detection 2017 entry. Slides at COCO 2017 workshop.

A third-party improvement of Deformable R-FCN + Soft NMS

Introduction

Deformable ConvNets is initially described in an ICCV 2017 oral paper. (Slides at ICCV 2017 Oral)

R-FCN is initially described in a NIPS 2016 paper.

<img src='demo/deformable_conv_demo1.png' width='800'> <img src='demo/deformable_conv_demo2.png' width='800'> <img src='demo/deformable_psroipooling_demo.png' width='800'>

Disclaimer

This is an official implementation for Deformable Convolutional Networks (Deformable ConvNets) based on MXNet. It is worth noticing that:

License

© Microsoft, 2017. Licensed under an MIT license.

Citing Deformable ConvNets

If you find Deformable ConvNets useful in your research, please consider citing:

@article{dai17dcn,
    Author = {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei},
    Title = {Deformable Convolutional Networks},
    Journal = {arXiv preprint arXiv:1703.06211},
    Year = {2017}
}
@inproceedings{dai16rfcn,
    Author = {Jifeng Dai, Yi Li, Kaiming He, Jian Sun},
    Title = {{R-FCN}: Object Detection via Region-based Fully Convolutional Networks},
    Conference = {NIPS},
    Year = {2016}
}

Main Results

training datatesting datamAP@0.5mAP@0.7time
R-FCN, ResNet-v1-101VOC 07+12 trainvalVOC 07 test79.663.10.16s
Deformable R-FCN, ResNet-v1-101VOC 07+12 trainvalVOC 07 test82.367.80.19s
<sub>training data</sub><sub>testing data</sub><sub>mAP</sub><sub>mAP@0.5</sub><sub>mAP@0.75</sub><sub>mAP@S</sub><sub>mAP@M</sub><sub>mAP@L</sub>
<sub>R-FCN, ResNet-v1-101 </sub><sub>coco trainval</sub><sub>coco test-dev</sub>32.154.333.812.834.946.1
<sub>Deformable R-FCN, ResNet-v1-101</sub><sub>coco trainval</sub><sub>coco test-dev</sub>35.756.838.315.238.851.5
<sub>Faster R-CNN (2fc), ResNet-v1-101 </sub><sub>coco trainval</sub><sub>coco test-dev</sub>30.352.131.49.932.247.4
<sub>Deformable Faster R-CNN (2fc), </br>ResNet-v1-101</sub><sub>coco trainval</sub><sub>coco test-dev</sub>35.055.038.314.337.752.0
<sub>training data</sub><sub>testing data</sub><sub>mAP</sub><sub>mAP@0.5</sub><sub>mAP@0.75</sub><sub>mAP@S</sub><sub>mAP@M</sub><sub>mAP@L</sub>
<sub> FPN+OHEM, ResNet-v1-101 </sub><sub>coco trainval35k</sub><sub>coco minival</sub>37.860.841.022.041.549.8
<sub>Deformable FPN + OHEM, ResNet-v1-101</sub><sub>coco trainval35k</sub><sub>coco minival</sub>41.263.545.524.344.954.4
<sub> FPN + OHEM + Soft NMS + multi-scale testing, </br>ResNet-v1-101 </sub><sub>coco trainval35k</sub><sub>coco minival</sub>40.962.546.027.144.152.2
<sub> Deformable FPN + OHEM + Soft NMS + multi-scale testing, ResNet-v1-101</sub><sub>coco trainval35k</sub><sub>coco minival</sub>44.465.550.230.847.356.4
training datatesting datamIoUtime
DeepLab, ResNet-v1-101Cityscapes trainCityscapes val70.30.51s
Deformable DeepLab, ResNet-v1-101Cityscapes trainCityscapes val75.20.52s
DeepLab, ResNet-v1-101VOC 12 train (augmented)VOC 12 val70.70.08s
Deformable DeepLab, ResNet-v1-101VOC 12 train (augmented)VOC 12 val75.90.08s

Running time is counted on a single Maxwell Titan X GPU (mini-batch size is 1 in inference).

Requirements: Software

  1. MXNet from the offical repository. We tested our code on MXNet@(commit 62ecb60). Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release.

  2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work.

  3. Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running

    pip install -r requirements.txt
    
  4. For Windows users, Visual Studio 2015 is needed to compile cython module.

Requirements: Hardware

Any NVIDIA GPUs with at least 4GB memory should be OK.

Installation

  1. Clone the Deformable ConvNets repository, and we'll call the directory that you cloned Deformable-ConvNets as ${DCN_ROOT}.
git clone https://github.com/msracver/Deformable-ConvNets.git
  1. For Windows users, run cmd .\init.bat. For Linux user, run sh ./init.sh. The scripts will build cython module automatically and create some folders.

  2. Install MXNet:

    Note: The MXNet's Custom Op cannot execute parallelly using multi-gpus after this PR. We strongly suggest the user rollback to version MXNet@(commit 998378a) for training (following Section 3.2 - 3.5).

    Quick start

    3.1 Install MXNet and all dependencies by

    pip install -r requirements.txt
    

    If there is no other error message, MXNet should be installed successfully.

    Build from source (alternative way)

    3.2 Clone MXNet and checkout to MXNet@(commit 998378a) by

    git clone --recursive https://github.com/dmlc/mxnet.git
    git checkout 998378a
    git submodule update
    # if it's the first time to checkout, just use: git submodule update --init --recursive
    

    3.3 Compile MXNet

    cd ${MXNET_ROOT}
    make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1
    

    3.4 Install the MXNet Python binding by

    Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4

    cd python
    sudo python setup.py install
    

    3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE), and modify MXNET_VERSION in ./experiments/rfcn/cfgs/*.yaml to $(YOUR_MXNET_PACKAGE). Thus you can switch among different versions of MXNet quickly.

  3. For Deeplab, we use the argumented VOC 2012 dataset. The argumented annotations are provided by SBD dataset. For convenience, we provide the converted PNG annotations and the lists of train/val images, please download them from OneDrive.

Demo & Deformable Model

We provide trained deformable convnet models, including the deformable R-FCN & Faster R-CNN models trained on COCO trainval, and the deformable DeepLab model trained on CityScapes train.

  1. To use the demo with our pre-trained deformable models, please download manually from OneDrive or BaiduYun, and put it under folder model/.

    Make sure it looks like this:

    ./model/rfcn_dcn_coco-0000.params
    ./model/rfcn_coco-0000.params
    ./model/fpn_dcn_coco-0000.params
    ./model/fpn_coco-0000.params
    ./model/rcnn_dcn_coco-0000.params
    ./model/rcnn_coco-0000.params
    ./model/deeplab_dcn_cityscapes-0000.params
    ./model/deeplab_cityscapes-0000.params
    ./model/deform_conv-0000.params
    ./model/deform_psroi-0000.params
    
  2. To run the R-FCN demo, run

    python ./rfcn/demo.py
    

    By default it will run Deformable R-FCN and gives several prediction results, to run R-FCN, use

    python ./rfcn/demo.py --rfcn_only
    
  3. To run the DeepLab demo, run

    python ./deeplab/demo.py
    

    By default it will run Deformable Deeplab and gives several prediction results, to run DeepLab, use

    python ./deeplab/demo.py --deeplab_only
    
  4. To visualize the offset of deformable convolution and deformable psroipooling, run

    python ./rfcn/deform_conv_demo.py
    python ./rfcn/deform_psroi_demo.py
    

Preparation for Training & Testing

For R-FCN/Faster R-CNN:

  1. Please download COCO and VOC 2007+2012 datasets, and make sure it looks like this:

    ./data/coco/
    ./data/VOCdevkit/VOC2007/
    ./data/VOCdevkit/VOC2012/
    
  2. Please download ImageNet-pretrained ResNet-v1-101 model manually from OneDrive, and put it under folder ./model. Make sure it looks like this:

    ./model/pretrained_model/resnet_v1_101-0000.params
    

For DeepLab:

  1. Please download Cityscapes and VOC 2012 datasets and make sure it looks like this:

    ./data/cityscapes/
    ./data/VOCdevkit/VOC2012/
    
  2. Please download argumented VOC 2012 annotations/image lists, and put the argumented annotations and the argumented train/val lists into:

    ./data/VOCdevkit/VOC2012/SegmentationClass/
    ./data/VOCdevkit/VOC2012/ImageSets/Main/
    

    , Respectively.

  3. Please download ImageNet-pretrained ResNet-v1-101 model manually from OneDrive, and put it under folder ./model. Make sure it looks like this:

    ./model/pretrained_model/resnet_v1_101-0000.params
    

Usage

  1. All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder ./experiments/rfcn/cfgs, ./experiments/faster_rcnn/cfgs and ./experiments/deeplab/cfgs/.

  2. Eight config files have been provided so far, namely, R-FCN for COCO/VOC, Deformable R-FCN for COCO/VOC, Faster R-CNN(2fc) for COCO/VOC, Deformable Faster R-CNN(2fc) for COCO/VOC, Deeplab for Cityscapes/VOC and Deformable Deeplab for Cityscapes/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC for R-FCN, respectively. For deeplab, we use 4 GPUs for all experiments.

  3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets on COCO with ResNet-v1-101, use the following command

    python experiments\rfcn\rfcn_end2end_train_test.py --cfg experiments\rfcn\cfgs\resnet_v1_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml
    

    A cache folder would be created automatically to save the model and the log under output/rfcn_dcn_coco/.

  4. Please find more details in config files and in our code.

Misc.

Code has been tested under:

FAQ

Q: It says AttributeError: 'module' object has no attribute 'DeformableConvolution'.

A: This is because either

<br/><br/> Q: I encounter segment fault at the beginning.

A: A compatibility issue has been identified between MXNet and opencv-python 3.0+. We suggest that you always import cv2 first before import mxnet in the entry script.

<br/><br/> Q: I find the training speed becomes slower when training for a long time.

A: It has been identified that MXNet on Windows has this problem. So we recommend to run this program on Linux. You could also stop it and resume the training process to regain the training speed if you encounter this problem.

<br/><br/> Q: Can you share your caffe implementation?

A: Due to several reasons (code is based on a old, internal Caffe, port to public Caffe needs extra work, time limit, etc.). We do not plan to release our Caffe code. Since current MXNet convolution implementation is very similar to Caffe (almost the same), it is easy to port to Caffe by yourself, the core CUDA code could be kept unchanged. Anyone who wish to do it is welcome to make a pull request.