Home

Awesome

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue.

<!-- <div align="center"> <img src="./resources/bigdetection.png" height="250px" /> </div> -->

This repo is the official implementation of BigDetection. It is based on mmdetection and CBNetV2.

Introduction

We construct a new large-scale benchmark termed BigDetection. Our goal is to simply leverage the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. BigDetection dataset has 600 object categories and contains 3.4M training images with 36M object bounding boxes. We show some important statistics of BigDetection in the following figure.

Left: Number of images per category of BigDetection. Right: Number of instances in different object sizes.

Results and Models

BigDetection Validation

We show the evaluation results on BigDetection Validation. We hope BigDetection could serve as a new challenging benchmark for evaluating next-level object detection methods.

MethodmAP (bigdet val)Links
YOLOv39.7model/config
Deformable DETR13.1model/config
Faster R-CNN (C4)*18.9model
Faster R-CNN (FPN)*19.4model
CenterNet2*23.1model
Cascade R-CNN*24.1model
CBNetV2-Swin-Base35.1model/config

COCO Validation

We show the finetuning performance on COCO minival/test-dev. Results show that BigDetection pre-training provides significant benefits for different detector architectures. We achieve 59.8 mAP on COCO test-dev with a single model.

MethodmAP (coco minival/test-dev)Links
YOLOv330.5/-config
Deformable DETR39.9/-model/config
Faster R-CNN (C4)*38.8/-model
Faster R-CNN (FPN)*40.5/-model
CenterNet2*45.3/-model
Cascade R-CNN*45.1/-model
CBNetV2-Swin-Base59.1/59.5model/config
CBNetV2-Swin-Base (TTA)59.5/59.8config

Data Efficiency

We followed STAC and SoftTeacher to evaluate on COCO for different partial annotation settings.

MethodmAP (1%)mAP (2%)mAP (5%)mAP (10%)
Baseline9.814.321.226.2
STAC14.018.324.428.6
SoftTeacher (ICCV 21)20.526.530.734.0
Ours25.328.131.934.1
modelmodelmodelmodel

Notes

Getting Started

Requirements

Installation

# Create conda environment
conda create -n bigdet python=3.7 -y
conda activate bigdet

# Install Pytorch
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=10.2 -c pytorch

# Install mmcv
pip install mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

# Clone and install
git clone https://github.com/amazon-research/bigdetection.git
cd bigdetection
pip install -r requirements/build.txt
pip install -v -e .

# Install Apex (optinal)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Data Preparation

Our BigDetection involves 3 datasets and train/val data can be downloaded from their official website (Objects365, OpenImages v6, LVIS v1.0). All datasets should be placed under $bigdetection/data/ as below. The synsets (total 600 class names) of BigDetection dataset can be downloaded here: bigdetection_synsets. Contact us with lkcai20@fudan.edu.cn to get access to our pre-processed annotation files.

bigdetection/data
└── BigDetection
    ├── annotations
    │   ├── bigdet_obj_train.json
    │   ├── bigdet_oid_train.json
    │   ├── bigdet_lvis_train.json
    │   ├── bigdet_val.json
    │   └── cas_weights.json
    ├── train
    │   ├── Objects365
    │   ├── OpenImages
    │   └── LVIS
    └── val

Training

To train a detector with pre-trained models, run:

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options load_from=<PRETRAIN_MODEL>

Pre-training

To pre-train a CBNetV2 with a Swin-Base backbone on BigDetection using 8 GPUs, run: (PRETRAIN_MODEL should be pre-trained checkpoint of Base-Swin-Transformer: model)

tools/dist_train.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_bigdet.py 8 \
    --cfg-options load_from=<PRETRAIN_MODEL>

To pre-train a Deformable-DETR with a ResNet-50 backbone on BigDetection, run:

tools/dist_train.sh configs/BigDetection/deformable_detr/deformable_detr_r50_16x2_8x_bigdet.py 8

Fine-tuning

To fine-tune a BigDetection pre-trained CBNetV2 (with Swin-Base backbone) on COCO, run: (PRETRAIN_MODEL should be BigDetection pre-trained checkpoint of CBNetV2: model)

tools/dist_train.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco.py 8 \
    --cfg-options load_from=<PRETRAIN_MODEL>

Inference

To evaluate a detector with pre-trained checkpoints, run:

tools/dist_test.sh <CONFIG_FILE> <CHECKPOINT> <GPU_NUM> --eval bbox

BigDetection evaluation

To evaluate pre-trained CBNetV2 on BigDetection validation, run:

tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_bigdet.py \
    <BIGDET_PRETRAIN_CHECKPOINT> 8 --eval bbox

COCO evaluation

To evaluate COCO-finetuned CBNetV2 on COCO validation, run:

# without test-time-augmentation
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco.py \
    <COCO_FINETUNE_CHECKPOINT> 8 --eval bbox mask

# with test-time-augmentation
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco_tta.py \
    <COCO_FINETUNE_CHECKPOINT> 8 --eval bbox mask

Other configuration based on Detectron2 can be found at detectron2-probject.

Citation

If you use our dataset or pretrained models in your research, please kindly consider to cite the following paper.

@article{bigdetection2022,
  title={BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training},
  author={Likun Cai and Zhi Zhang and Yi Zhu and Li Zhang and Mu Li and Xiangyang Xue},
  journal={arXiv preprint arXiv:2203.13249},
  year={2022}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

We thank the authors releasing mmdetection and CBNetv2 for object detection research community.