Home

Awesome

Swin Transformer for Object Detection

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on mmdetection.

Updates

05/11/2021 Models for MoBY are released

04/12/2021 Initial commits

Results and Models

Mask R-CNN

BackbonePretrainLr Schdbox mAPmask mAP#paramsFLOPsconfiglogmodel
Swin-TImageNet-1K1x43.739.848M267Gconfiggithub/baidugithub/baidu
Swin-TImageNet-1K3x46.041.648M267Gconfiggithub/baidugithub/baidu
Swin-SImageNet-1K3x48.543.369M359Gconfiggithub/baidugithub/baidu

Cascade Mask R-CNN

BackbonePretrainLr Schdbox mAPmask mAP#paramsFLOPsconfiglogmodel
Swin-TImageNet-1K1x48.141.786M745Gconfiggithub/baidugithub/baidu
Swin-TImageNet-1K3x50.443.786M745Gconfiggithub/baidugithub/baidu
Swin-SImageNet-1K3x51.945.0107M838Gconfiggithub/baidugithub/baidu
Swin-BImageNet-1K3x51.945.0145M982Gconfiggithub/baidugithub/baidu

RepPoints V2

BackbonePretrainLr Schdbox mAPmask mAP#paramsFLOPsconfiglogmodel
Swin-TImageNet-1K3x50.0-45M283Gconfiggithubgithub

Mask RepPoints V2

BackbonePretrainLr Schdbox mAPmask mAP#paramsFLOPsconfiglogmodel
Swin-TImageNet-1K3x50.443.847M292Gconfiggithubgithub

Notes:

Results of MoBY with Swin Transformer

Mask R-CNN

BackbonePretrainLr Schdbox mAPmask mAP#paramsFLOPsconfiglogmodel
Swin-TImageNet-1K1x43.639.648M267Gconfiggithub/baidugithub/baidu
Swin-TImageNet-1K3x46.041.748M267Gconfiggithub/baidugithub/baidu

Cascade Mask R-CNN

BackbonePretrainLr Schdbox mAPmask mAP#paramsFLOPsconfiglogmodel
Swin-TImageNet-1K1x48.141.586M745Gconfiggithub/baidugithub/baidu
Swin-TImageNet-1K3x50.243.586M745Gconfiggithub/baidugithub/baidu

Notes:

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments] 

For example, to train a Cascade Mask R-CNN model with a Swin-T backbone and 8 gpus, run:

tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL> 

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Other Links

Image Classification: See Swin Transformer for Image Classification.

Semantic Segmentation: See Swin Transformer for Semantic Segmentation.

Self-Supervised Learning: See MoBY with Swin Transformer.

Video Recognition, See Video Swin Transformer.