Awesome

VoVNet-v2 backbone networks in Detectron2

Efficient Backbone Network for Object Detection and Segmentation

[CenterMask(code)][CenterMask2(code)] [VoVNet-v1(arxiv)] [VoVNet-v2(arxiv)] [BibTeX]

In this project, we release code for VoVNet-v2 backbone network (introduced by CenterMask) in detectron2 as a extention form. VoVNet can extract diverse feature representation efficiently by using One-Shot Aggregation (OSA) module that concatenates subsequent layers at once. Since the OSA module can capture multi-scale receptive fields, the diversifed feature maps allow object detection and segmentation to address multi-scale objects and pixels well, especially robust on small objects. VoVNet-v2 improves VoVNet-v1 by adding identity mapping that eases the optimization problem and effective SE (Squeeze-and-Excitation) that enhances the diversified feature representation.

Highlight

Compared to ResNe(X)t backbone

Efficient : Faster speed
Accurate : Better performance, especially small object.

Update

Lightweight-VoVNet-19 has been released. (19/02/2020)
VoVNetV2-19-FPNLite has been released. (22/01/2020)
centermask2 has been released. (20/02/2020)

Results on MS-COCO in Detectron2

Note

We measure the inference time of all models with batch size 1 on the same V100 GPU machine.
We train all models using V100 8GPUs.

pytorch1.3.1
CUDA 10.1
cuDNN 7.3

Faster R-CNN

Lightweight-VoVNet with FPNLite

Backbone	Param.	lr sched	inference time	AP	APs	APm	APl	download
MobileNetV2	3.5M	3x	0.022	33.0	19.0	35.0	43.4	<a href="https://dl.dropbox.com/s/q4iceofvlcu207c/faster_mobilenetv2_FPNLite_ms_3x.pth">model</a> \| <a href="https://dl.dropbox.com/s/tz60e7rtnbsrdgd/faster_mobilenetv2_FPNLite_ms_3x_metrics.json">metrics</a>

V2-19	11.2M	3x	0.034	38.9	24.8	41.7	49.3	<a href="https://www.dropbox.com/s/u5pvmhc871ohvgw/fast_V_19_eSE_FPNLite_ms_3x.pth?dl=1">model</a> \| <a href="https://www.dropbox.com/s/riu7hkgzlmnndhc/fast_V_19_eSE_FPNLite_ms_3x_metrics.json">metrics</a>
V2-19-DW	6.5M	3x	0.027	36.7	22.7	40.0	46.0	<a href="https://www.dropbox.com/s/7h6zn0owumucs48/faster_rcnn_V_19_eSE_dw_FPNLite_ms_3x.pth?dl=1">model</a> \| <a href="https://www.dropbox.com/s/627hf4h1m485926/faster_rcnn_V_19_eSE_dw_FPNLite_ms_3x_metrics.json">metrics</a>
V2-19-Slim	3.1M	3x	0.023	35.2	21.7	37.3	44.4	<a href="https://www.dropbox.com/s/yao1i32zdylx279/faster_rcnn_V_19_eSE_slim_FPNLite_ms_3x.pth?dl=1">model</a> \| <a href="https://www.dropbox.com/s/jrgxltneki9hk84/faster_rcnn_V_19_eSE_slim_FPNLite_ms_3x_metrics.json">metrics</a>
V2-19-Slim-DW	1.8M	3x	0.022	32.4	19.1	34.6	41.8	<a href="https://www.dropbox.com/s/blpjx3iavrzkygt/faster_rcnn_V_19_eSE_slim_dw_FPNLite_ms_3x.pth?dl=1">model</a> \| <a href="https://www.dropbox.com/s/3og68zhq2ubr7mu/faster_rcnn_V_19_eSE_slim_dw_FPNLite_ms_3x_metrics.json">metrics</a>

DW and Slim denote depthwise separable convolution and a thiner model with half the channel size, respectively.

Backbone	Param.	lr sched	inference time	AP	APs	APm	APl	download
V2-19-FPN	37.6M	3x	0.040	38.9	24.9	41.5	48.8	<a href="https://www.dropbox.com/s/1rfvi6vzx45z6y5/faster_V_19_eSE_ms_3x.pth?dl=1">model</a> \| <a href="https://dl.dropbox.com/s/dq7406vo22wjxgi/faster_V_19_eSE_ms_3x_metrics.json">metrics</a>

R-50-FPN	51.2M	3x	0.047	40.2	24.2	43.5	52.0	<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl">model</a> \| <a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/metrics.json">metrics</a>
V2-39-FPN	52.6M	3x	0.047	42.7	27.1	45.6	54.0	<a href="https://dl.dropbox.com/s/dkto39ececze6l4/faster_V_39_eSE_ms_3x.pth">model</a> \| <a href="https://dl.dropbox.com/s/dx9qz1dn65ccrwd/faster_V_39_eSE_ms_3x_metrics.json">metrics</a>

R-101-FPN	70.1M	3x	0.063	42.0	25.2	45.6	54.6	<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/model_final_a3ec72.pkl">model</a> \| <a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/metrics.json">metrics</a>
V2-57-FPN	68.9M	3x	0.054	43.3	27.5	46.7	55.3	<a href="https://dl.dropbox.com/s/c7mb1mq10eo4pzk/faster_V_57_eSE_ms_3x.pth">model</a> \| <a href="https://dl.dropbox.com/s/3tsn218zzmuhyo8/faster_V_57_eSE_metrics.json">metrics</a>

X-101-FPN	114.3M	3x	0.120	43.0	27.2	46.1	54.9	<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/model_final_2d9806.pkl">model</a> \| <a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/metrics.json">metrics</a>
V2-99-FPN	96.9M	3x	0.073	44.1	28.1	47.0	56.4	<a href="https://dl.dropbox.com/s/v64mknwzfpmfcdh/faster_V_99_eSE_ms_3x.pth">model</a> \| <a href="https://dl.dropbox.com/s/zvaz9s8gvq2mhrd/faster_V_99_eSE_ms_3x_metrics.json">metrics</a>

Mask R-CNN

Backbone	lr sched	inference time	box AP	box APs	box APm	box APl	mask AP	mask APs	mask APm	mask APl	download
V2-19-FPNLite	3x	0.036	39.7	25.1	42.6	50.8	36.4	19.9	38.8	50.8	<a href="https://www.dropbox.com/s/h1khv9l7quakvz0/mask_V_19_eSE_FPNLite_ms_3x.pth?dl=1">model</a> \| <a href="https://www.dropbox.com/s/8fophrb1f1mf9ih/mask_V_19_eSE_FPNLite_ms_3x_metrics.json">metrics</a>
V2-19-FPN	3x	0.044	40.1	25.4	43.0	51.0	36.6	19.7	38.7	51.2	<a href="https://www.dropbox.com/s/dyeyuag5va96tqo/mask_V_19_eSE_ms_3x.pth?dl=1">model</a> \| <a href="https://dl.dropbox.com/s/0y0q97gi8u8kq2n/mask_V_19_eSE_ms_3x_metrics.json">metrics</a>

R-50-FPN	3x	0.055	41.0	24.9	43.9	53.3	37.2	18.6	39.5	53.3	<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl">model</a> \| <a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/metrics.json">metrics</a>
V2-39-FPN	3x	0.052	43.8	27.6	47.2	55.3	39.3	21.4	41.8	54.6	<a href="https://dl.dropbox.com/s/c5o3yr6lwrb1170/mask_V_39_eSE_ms_3x.pth">model</a> \| <a href="https://dl.dropbox.com/s/21xqlv1ofn7oa1z/mask_V_39_eSE_metrics.json">metrics</a>

R-101-FPN	3x	0.070	42.9	26.4	46.6	56.1	38.6	19.5	41.3	55.3	<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/model_final_a3ec72.pkl">model</a> \| <a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/metrics.json">metrics</a>
V2-57-FPN	3x	0.058	44.2	28.2	47.2	56.8	39.7	21.6	42.2	55.6	<a href="https://dl.dropbox.com/s/aturknfroupyw92/mask_V_57_eSE_ms_3x.pth">model</a> \| <a href="https://dl.dropbox.com/s/8sdek6hkepcu7na/mask_V_57_eSE_metrics.json">metrics</a>

X-101-FPN	3x	0.129	44.3	27.5	47.6	56.7	39.5	20.7	42.0	56.5	<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/model_final_2d9806.pkl">model</a> \| <a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/metrics.json">metrics</a>
V2-99-FPN	3x	0.076	44.9	28.5	48.1	57.7	40.3	21.7	42.8	56.6	<a href="https://dl.dropbox.com/s/qx45cnv718k4zmn/mask_V_99_eSE_ms_3x.pth">model</a> \| <a href="https://dl.dropbox.com/s/u1sav8deha47odp/mask_V_99_eSE_metrics.json">metrics</a>

Panoptic-FPN on COCO

<table><tbody>   <th valign="bottom">Name</th> <th valign="bottom">lr<br/>sched</th> <th valign="bottom">inference<br/>time<br/>(s/im)</th> <th valign="bottom">box<br/>AP</th> <th valign="bottom">mask<br/>AP</th> <th valign="bottom">PQ</th> <th valign="bottom">download</th>   <tr><td align="left">R-50-FPN</td> <td align="center">3x</td> <td align="center">0.063</td> <td align="center">40.0</td> <td align="center">36.5</td> <td align="center">41.5</td> <td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_50_3x/139514569/model_final_c10459.pkl">model</a> | <a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_50_3x/139514569/metrics.json">metrics</a></td> </tr>  <tr><td align="left">V2-39-FPN</td> <td align="center">3x</td> <td align="center">0.063</td> <td align="center">42.8</td> <td align="center">38.5</td> <td align="center">43.4</td> <td align="center"><a href="https://www.dropbox.com/s/fnr9r4arv0cbfbf/panoptic_V_39_eSE_3x.pth?dl=1">model</a> | <a href="https://dl.dropbox.com/s/vftfukrjuu7w1ao/panoptic_V_39_eSE_3x_metrics.json">metrics</a></td> </tr>  <tr><td align="left">R-101-FPN</td> <td align="center">3x</td> <td align="center">0.078</td> <td align="center">42.4</td> <td align="center">38.5</td> <td align="center">43.0</td> <td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/model_final_cafdb1.pkl">model</a> | <a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/metrics.json">metrics</a></td> </tr>  <tr><td align="left">V2-57-FPN</td> <td align="center">3x</td> <td align="center">0.070</td> <td align="center">43.4</td> <td align="center">39.2</td> <td align="center">44.3</td> <td align="center"><a href="https://www.dropbox.com/s/zhoqx5rvc0jj0oa/panoptic_V_57_eSE_3x.pth?dl=1">model</a> | <a href="https://dl.dropbox.com/s/20hwrmru15dilre/panoptic_V_57_eSE_3x_metrics.json">metrics</a></td> </tr> </tbody></table>

Using this command with --num-gpus 1

python /path/to/vovnet-detectron2/train_net.py --config-file /path/to/vovnet-detectron2/configs/<config.yaml> --eval-only --num-gpus 1 MODEL.WEIGHTS <model.pth>

Installation

As this vovnet-detectron2 is implemented as a extension form (detectron2/projects) upon detectron2, you just install detectron2 following INSTALL.md.

Prepare for coco dataset following this instruction.

Training

ImageNet Pretrained Models

We provide backbone weights pretrained on ImageNet-1k dataset.

To train a model, run

python /path/to/vovnet-detectron2/train_net.py --config-file /path/to/vovnet-detectron2/configs/<config.yaml>

For example, to launch end-to-end Faster R-CNN training with VoVNetV2-39 backbone on 8 GPUs, one should execute:

python /path/to/vovnet-detectron2/train_net.py --config-file /path/to/vovnet-detectron2/configs/faster_rcnn_V_39_FPN_3x.yaml --num-gpus 8

Evaluation

Model evaluation can be done similarly:

python /path/to/vovnet-detectron2/train_net.py --config-file /path/to/vovnet-detectron2/configs/faster_rcnn_V_39_FPN_3x.yaml --eval-only MODEL.WEIGHTS <model.pth>

TODO

Adding Lightweight models
Applying VoVNet for other meta-architectures

<a name="CitingVoVNet"></a>Citing VoVNet

If you use VoVNet, please use the following BibTeX entry.

@inproceedings{lee2019energy,
  title = {An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection},
  author = {Lee, Youngwan and Hwang, Joong-won and Lee, Sangrok and Bae, Yuseok and Park, Jongyoul},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops},
  year = {2019}
}

@article{lee2019centermask,
  title={CenterMask: Real-Time Anchor-Free Instance Segmentation},
  author={Lee, Youngwan and Park, Jongyoul},
  booktitle={CVPR},
  year={2020}
}