Home

Awesome

Detection Transformers with Assignment

By Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl

This repository is an official implementation of the paper NMS Strikes Back.

TL; DR. Detection Transformers with Assignment (DETA) re-introduce IoU assignment and NMS for transformer-based detectors. DETA trains and tests comparibly as fast as Deformable-DETR and converges much faster (50.2 mAP in 12 epochs on COCO).

DETR's one-to-one bipartite matchingOur many-to-one IoU-based assignment

Main Results

MethodEpochsCOCO <br/> val APTotal Train time <br/> (8 GPU hours)Batch Infer <br/>Speed (FPS)URL
Two-stage Deformable DETR5046.942.5-see <br/> DeformDETR
Improved Deformable DETR5049.666.613.4config<br/>log <br/>model
DETA1250.116.312.7config<br/>log <br/>model
DETA2451.132.512.7config<br/>log <br/>model
DETA (Swin-L)2462.91004.2config-O365<br/>model-O365 <br/> config <br/>model

Note:

  1. Unless otherwise specified, the model uses ResNet-50 backbone and training (ResNet-50) is done on 8 Nvidia Quadro RTX 6000 GPU.
  2. Inference speed is measured on Nvidia Tesla V100 GPU.
  3. "Batch Infer Speed" refer to inference with batch size = 4 to maximize GPU utilization.
  4. Improved DeformableDETR implements two-stage Deformable DETR with improved hyperparameters (e.g. more queries, more feature levels, see full list here).
  5. DETA with Swin-L backbone is pretrained on Object-365 and fine-tuned on COCO. This model attains 63.5AP on COCO test-dev. Times refer to fine-tuning (O365 pre-training takes 14000 GPU hours). We additionally provide the pre-trained Object365 config and model prior to fine-tuning.

Installation

Please follow instructions from Deformable-DETR for installation, data preparation, and additional usage examples. Tested on torch1.8.0+cuda10.1 and torch1.6.0+cuda9.2 and torch1.11.0+cuda11.3

Usage

Evaluation

You can evaluate our pretrained DETA models from the above table on COCO 2017 validation set:

./configs/deta.sh --eval --coco_path ./data/coco --resume <path_to_model>

You can also run distributed evaluation:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta.sh \
    --eval --coco_path ./data/coco --resume <path_to_model>

You can also run distributed evaluation on our Swin-L model:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta_swin_ft.sh \
    --eval --coco_path ./data/coco --resume <path_to_model>

Training

Training on single node

Training DETA on 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta.sh --coco_path ./data/coco

Training on slurm cluster

If you are using slurm cluster, you can simply run the following command to train on 1 node with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deta 8 configs/deta.sh \
    --coco_path ./data/coco

Fine-tune DETA with Swin-L on 2 nodes of each with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deta 16 configs/deta_swin_ft.sh \
    --coco_path ./data/coco --finetune <path_to_o365_model>

License

This project builds heavily off of Deformable-DETR and Detectron2. Please refer to their original licenses for more details. If you are using Swin-L backbone, please see Swin original license.

Citing DETA

If you find DETA useful in your research, please consider citing:

@article{ouyangzhang2022nms,
  title={NMS Strikes Back},
  author={Ouyang-Zhang, Jeffrey and Cho, Jang Hyun and Zhou, Xingyi and Kr{\"a}henb{\"u}hl, Philipp},
  journal={arXiv preprint arXiv:2212.06137},
  year={2022}
}