Awesome

Representation Sharing for Fast Object Detector Search and Beyond

This is the PyTorch implementation for the paper:

Representation Sharing for Fast Object Detector Search and Beyond;
Y. Zhong, Z. Deng, S. Guo, M. R. Scott, W. Huang 
In: Proc. European Conference on Computer Vision (ECCV), 2020.
arXiv preprint arXiv:2007.12075

The full paper is available at: https://arxiv.org/abs/2007.12075.

Highlights

Support architecture search: This implementation supports both the architecture search and the training of the searched detectors.
Efficient search: FAD is an order of magnitude faster than previous object detector search methods. A complete search on Pascal VOC 2012 with FCOS only takes 0.6 GPU-days.
Diverse operations: FAD contains up to 12 candidate operations in the search space, providing more combinations to be explored.
Better performance: The searched FAD components can improve the detection performance by up to 1.7 AP over their counterparts on MS-COCO.

Hardware

For the architecture search, 1 Nvidia Titan Xp GPU is used. Multi-GPU search is also supported.
For the training (with searched architectures), 8 Nvidia Titan Xp GPUs are used.

Datasets

Two datasets are required.

MS-COCO 2017
VOC 2012 (for the architecture search)

The two datasets are expected to be placed in the following structure:

FAD/
  datasets/
    coco/
      images/
      annotations/
        instances_{train,val}2017.json  
    voc/
      VOC2012/
        Annotations/
        ImageSets/
        JPEGImages/

Installation

This FAD implementation is based on FCOS and DARTS. The installation of FAD is mostly the same as FCOS, with two additional packages in the requirements:

tensorboardX
graphviz

The full installation instructions of FAD refer to INSTALL.md.

Architecture search

The following command line will start the search for the subsnetworks with FCOS on Pascal VOC 2012:

CUDA_VISIBLE_DEVICES=0 \
python -m torch.distributed.launch \
    --nproc_per_node=1 \
    --master_port=$((RANDOM + 10000)) \
    tools/search_net.py \
    --skip-test \
    --config-file configs/fad/search/fad-fcos_imprv_R_50_FPN_1x.yaml \
    --use-tensorboard \
    DATALOADER.NUM_WORKERS 2 \
    OUTPUT_DIR training_dir/search/fad-fcos_imprv_R_50_FPN_1x

Notes:

If out-of-GPU-memory issue occurs, you can conduct the search using more than one GPU. For example, you can simply change CUDA_VISIBLE_DEVICES=0 to CUDA_VISIBLE_DEVICES=0,1,2,3 to start the multi-GPU search, where 0,1,2,3 are the IDs of the GPUs to be used.
The genotypes of the derived architectures will be saved into OUTPUT_DIR, as a file named genotype.log. The first genotype is for bounding box regreesion subnetwork, and the second is for the classification subnetwork. This genotype.log will be loaded during training and evaluation.

Training with searched architectures

The following command line will train FCOS_imprv_R_50_FPN_1x with the searched subnetworks on 8 GPUs:

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --master_port=$((RANDOM + 10000)) \
    tools/train_net.py \
    --config-file configs/fad/augment/fad-fcos_imprv_R_50_FPN_1x.yaml \
    --genotype-file training_dir/search/fad-fcos_imprv_R_50_FPN_1x/genotype.log \
    DATALOADER.NUM_WORKERS 2 \
    OUTPUT_DIR training_dir/augment/fad-fcos_imprv_R_50_FPN_1x

Notes:

The models will be saved into OUTPUT_DIR.
If you want to augment using different backbone networks, please change the file specified by --config-file.
If you want to search or train detectors on your own dataset, please prepare your dataset (including images and annotation files) in the same structure as MS-COCO, and change the config file accordingly.

Evaluation

The command line below evaluate the trained model on MS-COCO minival split:

python -m torch.distributed.launch \
    tools/test_net.py \
    --config-file configs/fad/augment/fad-fcos_imprv_R_50_FPN_1x.yaml \
    --genotype-file training_dir/search/fad-fcos_imprv_R_50_FPN_1x/genotype.log \
    MODEL.WEIGHT path_to_the_weights.pth \
    TEST.IMS_PER_BATCH 4

Notes:

Please replace path_to_the_weights.pth with your own trained model weights. For example, training_dir/augment/fad-fcos_imprv_R_50_FPN_1x/model_final.pth.
The weights file must match the detector specified by the config file and genotype file.
The inference can be done using multiple GPUs. You can modify the command line in a similar way as augment.

Results

The following results are based on the searched architectures displayed in the paper.

Model	Channel size	Multi-scale training	AP (minival)	AP (test-dev)
fad-fcos_imprv_R_50_FPN_1x	96	No	40.3	-
fad-fcos_imprv_R_101_FPN_2x	96	Yes	44.2	44.1
fad-fcos_imprv_R_101_64x4d_FPN_2x	128	Yes	46.0	46.0

Explored architectures

For object detection:

# 1st row: bounding box regression subnetwork
# 2nd row: classification subnetwork

# --- Architectures reported in the paper
Genotype(normal=[[('sinSep_conv_3x3_x3', 0), ('std_conv_3x3_x3', 1)], [('std_conv_3x3', 2), ('sinSep_conv_3x3+dil_conv_3x3', 1)], [('sinSep_conv_3x3_x2', 2), ('std_conv_3x3_x2', 3)]], normal_concat=range(2, 5))
Genotype(normal=[[('sinSep_conv_3x3_x2+dil_conv_3x3', 1), ('std_conv_3x3_x3', 0)], [('sinSep_conv_3x3_x2+dil_conv_3x3', 0), ('sinSep_conv_3x3', 2)], [('sinSep_conv_3x3_x2', 2), ('std_conv_3x3', 3)]], normal_concat=range(2, 5))

# --- Other good architectures 
Genotype(normal=[[('std_conv_3x3_x3', 0), ('std_conv_3x3_x2', 1)], [('std_conv_3x3', 0), ('std_conv_3x3+dil_conv_3x3', 2)], [('std_conv_3x3_x2+dil_conv_3x3', 3), ('std_conv_3x3+dil_conv_3x3', 2)]], normal_concat=range(2, 5))
Genotype(normal=[[('std_conv_3x3_x3', 1), ('sinSep_conv_3x3+dil_conv_3x3', 0)], [('sinSep_conv_3x3', 0), ('sinSep_conv_3x3+dil_conv_3x3', 2)], [('std_conv_3x3', 1), ('std_conv_3x3_x3', 0)]], normal_concat=range(2, 5))

Citations

Please cite our paper if this implementation helps your research. BibTeX reference is shown in the following.

@inproceedings{zhong2020representation,
  title={Representation Sharing for Fast Object Detector Search and Beyond}, 
  author={Zhong, Yujie and Deng, Zelu and Guo, Sheng and Scott, Matthew R and Huang, Weilin}, 
  booktitle={In: Proceedings of the European Conference on Computer Vision (ECCV)},  
  year={2020} 
  }

Contact

For any questions, please feel free to reach:

github@malongtech.com

License

FAD is CC-BY-NC 4.0 licensed, as found in the LICENSE file. It is released for academic research / non-commercial use only. If you wish to use for commercial purposes, please contact sales@malongtech.com.