Awesome

Detection Transformers with Assignment

By Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl

This repository is an official implementation of the paper NMS Strikes Back.

TL; DR. Detection Transformers with Assignment (DETA) re-introduce IoU assignment and NMS for transformer-based detectors. DETA trains and tests comparibly as fast as Deformable-DETR and converges much faster (50.2 mAP in 12 epochs on COCO).

DETR's one-to-one bipartite matching	Our many-to-one IoU-based assignment

Main Results

Method	Epochs	COCO <br/> val AP	Total Train time <br/> (8 GPU hours)	Batch Infer <br/>Speed (FPS)	URL
Two-stage Deformable DETR	50	46.9	42.5	-	see <br/> DeformDETR
Improved Deformable DETR	50	49.6	66.6	13.4	config<br/>log <br/>model
DETA	12	50.1	16.3	12.7	config<br/>log <br/>model
DETA	24	51.1	32.5	12.7	config<br/>log <br/>model
DETA (Swin-L)	24	62.9	100	4.2	config-O365<br/>model-O365 <br/> config <br/>model

Note:

Unless otherwise specified, the model uses ResNet-50 backbone and training (ResNet-50) is done on 8 Nvidia Quadro RTX 6000 GPU.
Inference speed is measured on Nvidia Tesla V100 GPU.
"Batch Infer Speed" refer to inference with batch size = 4 to maximize GPU utilization.
Improved DeformableDETR implements two-stage Deformable DETR with improved hyperparameters (e.g. more queries, more feature levels, see full list here).
DETA with Swin-L backbone is pretrained on Object-365 and fine-tuned on COCO. This model attains 63.5AP on COCO test-dev. Times refer to fine-tuning (O365 pre-training takes 14000 GPU hours). We additionally provide the pre-trained Object365 config and model prior to fine-tuning.

Installation

Please follow instructions from Deformable-DETR for installation, data preparation, and additional usage examples. Tested on torch1.8.0+cuda10.1 and torch1.6.0+cuda9.2 and torch1.11.0+cuda11.3

Usage

Evaluation

You can evaluate our pretrained DETA models from the above table on COCO 2017 validation set:

./configs/deta.sh --eval --coco_path ./data/coco --resume <path_to_model>

You can also run distributed evaluation:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta.sh \
    --eval --coco_path ./data/coco --resume <path_to_model>

You can also run distributed evaluation on our Swin-L model:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta_swin_ft.sh \
    --eval --coco_path ./data/coco --resume <path_to_model>

Training

Training on single node

Training DETA on 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta.sh --coco_path ./data/coco

Training on slurm cluster

If you are using slurm cluster, you can simply run the following command to train on 1 node with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deta 8 configs/deta.sh \
    --coco_path ./data/coco

Fine-tune DETA with Swin-L on 2 nodes of each with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deta 16 configs/deta_swin_ft.sh \
    --coco_path ./data/coco --finetune <path_to_o365_model>

License

This project builds heavily off of Deformable-DETR and Detectron2. Please refer to their original licenses for more details. If you are using Swin-L backbone, please see Swin original license.

Citing DETA

If you find DETA useful in your research, please consider citing:

@article{ouyangzhang2022nms,
  title={NMS Strikes Back},
  author={Ouyang-Zhang, Jeffrey and Cho, Jang Hyun and Zhou, Xingyi and Kr{\"a}henb{\"u}hl, Philipp},
  journal={arXiv preprint arXiv:2212.06137},
  year={2022}
}