Awesome
Representation Sharing for Fast Object Detector Search and Beyond
This is the PyTorch implementation for the paper:
Representation Sharing for Fast Object Detector Search and Beyond;
Y. Zhong, Z. Deng, S. Guo, M. R. Scott, W. Huang
In: Proc. European Conference on Computer Vision (ECCV), 2020.
arXiv preprint arXiv:2007.12075
The full paper is available at: https://arxiv.org/abs/2007.12075.
Highlights
- Support architecture search: This implementation supports both the architecture search and the training of the searched detectors.
- Efficient search: FAD is an order of magnitude faster than previous object detector search methods. A complete search on Pascal VOC 2012 with FCOS only takes 0.6 GPU-days.
- Diverse operations: FAD contains up to 12 candidate operations in the search space, providing more combinations to be explored.
- Better performance: The searched FAD components can improve the detection performance by up to 1.7 AP over their counterparts on MS-COCO.
Hardware
- For the architecture search, 1 Nvidia Titan Xp GPU is used. Multi-GPU search is also supported.
- For the training (with searched architectures), 8 Nvidia Titan Xp GPUs are used.
Datasets
Two datasets are required.
- MS-COCO 2017
- VOC 2012 (for the architecture search)
The two datasets are expected to be placed in the following structure:
FAD/
datasets/
coco/
images/
annotations/
instances_{train,val}2017.json
voc/
VOC2012/
Annotations/
ImageSets/
JPEGImages/
Installation
This FAD implementation is based on FCOS and DARTS. The installation of FAD is mostly the same as FCOS, with two additional packages in the requirements:
- tensorboardX
- graphviz
The full installation instructions of FAD refer to INSTALL.md.
Architecture search
The following command line will start the search for the subsnetworks with FCOS on Pascal VOC 2012:
CUDA_VISIBLE_DEVICES=0 \
python -m torch.distributed.launch \
--nproc_per_node=1 \
--master_port=$((RANDOM + 10000)) \
tools/search_net.py \
--skip-test \
--config-file configs/fad/search/fad-fcos_imprv_R_50_FPN_1x.yaml \
--use-tensorboard \
DATALOADER.NUM_WORKERS 2 \
OUTPUT_DIR training_dir/search/fad-fcos_imprv_R_50_FPN_1x
Notes:
- If out-of-GPU-memory issue occurs, you can conduct the search using more than one GPU. For example, you can simply change
CUDA_VISIBLE_DEVICES=0
toCUDA_VISIBLE_DEVICES=0,1,2,3
to start the multi-GPU search, where0,1,2,3
are the IDs of the GPUs to be used. - The genotypes of the derived architectures will be saved into
OUTPUT_DIR
, as a file named genotype.log. The first genotype is for bounding box regreesion subnetwork, and the second is for the classification subnetwork. This genotype.log will be loaded during training and evaluation.
Training with searched architectures
The following command line will train FCOS_imprv_R_50_FPN_1x with the searched subnetworks on 8 GPUs:
python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=$((RANDOM + 10000)) \
tools/train_net.py \
--config-file configs/fad/augment/fad-fcos_imprv_R_50_FPN_1x.yaml \
--genotype-file training_dir/search/fad-fcos_imprv_R_50_FPN_1x/genotype.log \
DATALOADER.NUM_WORKERS 2 \
OUTPUT_DIR training_dir/augment/fad-fcos_imprv_R_50_FPN_1x
Notes:
- The models will be saved into
OUTPUT_DIR
. - If you want to augment using different backbone networks, please change the file specified by
--config-file
. - If you want to search or train detectors on your own dataset, please prepare your dataset (including images and annotation files) in the same structure as MS-COCO, and change the config file accordingly.
Evaluation
The command line below evaluate the trained model on MS-COCO minival split:
python -m torch.distributed.launch \
tools/test_net.py \
--config-file configs/fad/augment/fad-fcos_imprv_R_50_FPN_1x.yaml \
--genotype-file training_dir/search/fad-fcos_imprv_R_50_FPN_1x/genotype.log \
MODEL.WEIGHT path_to_the_weights.pth \
TEST.IMS_PER_BATCH 4
Notes:
- Please replace
path_to_the_weights.pth
with your own trained model weights. For example, training_dir/augment/fad-fcos_imprv_R_50_FPN_1x/model_final.pth. - The weights file must match the detector specified by the config file and genotype file.
- The inference can be done using multiple GPUs. You can modify the command line in a similar way as augment.
Results
The following results are based on the searched architectures displayed in the paper.
Model | Channel size | Multi-scale training | AP (minival) | AP (test-dev) |
---|---|---|---|---|
fad-fcos_imprv_R_50_FPN_1x | 96 | No | 40.3 | - |
fad-fcos_imprv_R_101_FPN_2x | 96 | Yes | 44.2 | 44.1 |
fad-fcos_imprv_R_101_64x4d_FPN_2x | 128 | Yes | 46.0 | 46.0 |
Explored architectures
For object detection:
# 1st row: bounding box regression subnetwork
# 2nd row: classification subnetwork
# --- Architectures reported in the paper
Genotype(normal=[[('sinSep_conv_3x3_x3', 0), ('std_conv_3x3_x3', 1)], [('std_conv_3x3', 2), ('sinSep_conv_3x3+dil_conv_3x3', 1)], [('sinSep_conv_3x3_x2', 2), ('std_conv_3x3_x2', 3)]], normal_concat=range(2, 5))
Genotype(normal=[[('sinSep_conv_3x3_x2+dil_conv_3x3', 1), ('std_conv_3x3_x3', 0)], [('sinSep_conv_3x3_x2+dil_conv_3x3', 0), ('sinSep_conv_3x3', 2)], [('sinSep_conv_3x3_x2', 2), ('std_conv_3x3', 3)]], normal_concat=range(2, 5))
# --- Other good architectures
Genotype(normal=[[('std_conv_3x3_x3', 0), ('std_conv_3x3_x2', 1)], [('std_conv_3x3', 0), ('std_conv_3x3+dil_conv_3x3', 2)], [('std_conv_3x3_x2+dil_conv_3x3', 3), ('std_conv_3x3+dil_conv_3x3', 2)]], normal_concat=range(2, 5))
Genotype(normal=[[('std_conv_3x3_x3', 1), ('sinSep_conv_3x3+dil_conv_3x3', 0)], [('sinSep_conv_3x3', 0), ('sinSep_conv_3x3+dil_conv_3x3', 2)], [('std_conv_3x3', 1), ('std_conv_3x3_x3', 0)]], normal_concat=range(2, 5))
Citations
Please cite our paper if this implementation helps your research. BibTeX reference is shown in the following.
@inproceedings{zhong2020representation,
title={Representation Sharing for Fast Object Detector Search and Beyond},
author={Zhong, Yujie and Deng, Zelu and Guo, Sheng and Scott, Matthew R and Huang, Weilin},
booktitle={In: Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Contact
For any questions, please feel free to reach:
github@malongtech.com
License
FAD is CC-BY-NC 4.0 licensed, as found in the LICENSE file. It is released for academic research / non-commercial use only. If you wish to use for commercial purposes, please contact sales@malongtech.com.