Awesome
English | 简体中文
Salience DETR
This repository is an official implementation of the Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement accepeted to CVPR 2024 (score 553). Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen.
💖 If our Salience-DETR is helpful to your researches or projects, please star this repository. Thanks! 🤗
<div align="center"> <img src="images/Salience-DETR.svg"> </div> <details> <summary>✨Highlights</summary>- We offer a deepened analysis for scale bias and query redundancy issues of two-stage DETR-like methods.
- We present a hierarchical filtering mechanism to reduce the computational complexity under salience supervision. The proposed salience supervision benefits to capture fine-grained object contours even with bounding box annotations.
- Salience DETR achieves +4.0%, +0.2%, and +4.4% AP on three challenging defect detection tasks, and comparable performance (49.2 AP) with about only 70% FLOPs on COCO 2017.
- Queries in the two-stage selection of existing DETR-like methods is usually redundant and have scale bias (left).
- Salience supervision benefits to capture object contours even with only bounding box annotations, for both defect detection and object detection tasks (right).
Update
-
[2024-07-18] We release Relation-DETR, a general and strong object detection model that achieves 40+% AP using only 2 epochs and suppresses most SOTA methods including DDQ-DETR, StableDINO, Rank-DETR, MS-DETR. Code and checkpoints are available here.
-
[2024-04-19] Salience DETR with FocalNet-Large achieves 56.8 AP on COCO val2017, config and checkpoint are available!
-
[2024-04-08] Update config and checkpoint of Salience DETR with ConvNeXt-L backbone trained on COCO 2017 (12epoch).
-
[2024-04-01] Our Salience DETR with Swin-L backbone achieves 56.5 AP on COCO 2017 (12epoch). The model config and checkpoint are available.
-
[2024-03-26] We release code of Salience DETR and pretrained weights on COCO 2017 for Salience DETR with ResNet50 backbone.
-
[2024-02-29] Salience DETR is accepted in CVPR2024, and code will be released in the repo. Welcome to your attention!
Model Zoo
12 epoch setting
Model | backbone | mAP | AP<sub>50 | AP<sub>75 | AP<sub>S | AP<sub>M | AP<sub>L | Download |
---|---|---|---|---|---|---|---|---|
Salience DETR | ResNet50 | 50.0 | 67.7 | 54.2 | 33.3 | 54.4 | 64.4 | config / checkpoint |
Salience DETR | ConvNeXt-L | 54.2 | 72.4 | 59.1 | 38.8 | 58.3 | 69.6 | config / checkpoint |
Salience DETR | Swin-L<sub>(IN-22K) | 56.5 | 75.0 | 61.5 | 40.2 | 61.2 | 72.8 | config / checkpoint |
Salience DETR | FocalNet-L<sub>(IN-22K) | 57.3 | 75.5 | 62.3 | 40.9 | 61.8 | 74.5 | config / checkpoint |
24 epoch setting
Model | backbone | mAP | AP<sub>50 | AP<sub>75 | AP<sub>S | AP<sub>M | AP<sub>L | Download |
---|---|---|---|---|---|---|---|---|
Salience DETR | ResNet50 | 51.2 | 68.9 | 55.7 | 33.9 | 55.5 | 65.6 | config / checkpoint |
🔧Installation
-
Clone the repository locally:
git clone https://github.com/xiuqhou/Salience-DETR.git cd Salience-DETR/
-
Create a conda environment and activate it:
conda create -n salience_detr python=3.8 conda activate salience_detr
-
Install PyTorch and Torchvision following the instruction on https://pytorch.org/get-started/locally/. The code requires
python>=3.8, torch>=1.11.0, torchvision>=0.12.0
.conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
-
Install other dependencies with:
conda install --file requirements.txt -c conda-forge
That's all, you don't need to compile CUDA operators mannually since we load it automatically when running for the first time.
📁Prepare Dataset
Please download COCO 2017 or prepare your own datasets into data/
, and organize them as following. You can use tools/visualize_datasets.py
to visualize the dataset annotations to verify its correctness.
coco/
├── train2017/
├── val2017/
└── annotations/
├── instances_train2017.json
└── instances_val2017.json
<details>
<summary>Example for visualization</summary>
python tools/visualize_datasets.py \
--coco-img data/coco/val2017 \
--coco-ann data/coco/annotations/instances_val2017.json \
--show-dir visualize_dataset/
</details>
📚︎Train a model
We use accelerate
package to natively handle multi GPUs, use CUDA_VISIBLE_DEVICES
to specify GPU/GPUs. If not specified, the script will use all available GPUs on the node to train.
CUDA_VISIBLE_DEVICES=0 accelerate launch main.py # train with 1 GPU
CUDA_VISIBLE_DEVICES=0,1 accelerate launch main.py # train with 2 GPUs
Before start training, modify parameters in configs/train_config.py
.
from torch import optim
from datasets.coco import CocoDetection
from transforms import presets
from optimizer import param_dict
# Commonly changed training configurations
num_epochs = 12 # train epochs
batch_size = 2 # total_batch_size = #GPU x batch_size
num_workers = 4 # workers for pytorch DataLoader
pin_memory = True # whether pin_memory for pytorch DataLoader
print_freq = 50 # frequency to print logs
starting_epoch = 0
max_norm = 0.1 # clip gradient norm
output_dir = None # path to save checkpoints, default for None: checkpoints/{model_name}
find_unused_parameters = False # useful for debugging distributed training
# define dataset for train
coco_path = "data/coco" # /PATH/TO/YOUR/COCODIR
train_transform = presets.detr # see transforms/presets to choose a transform
train_dataset = CocoDetection(
img_folder=f"{coco_path}/train2017",
ann_file=f"{coco_path}/annotations/instances_train2017.json",
transforms=train_transform,
train=True,
)
test_dataset = CocoDetection(
img_folder=f"{coco_path}/val2017",
ann_file=f"{coco_path}/annotations/instances_val2017.json",
transforms=None, # the eval_transform is integrated in the model
)
# model config to train
model_path = "configs/salience_detr/salience_detr_resnet50_800_1333.py"
# specify a checkpoint folder to resume, or a pretrained ".pth" to finetune, for example:
# checkpoints/salience_detr_resnet50_800_1333/train/2024-03-22-09_38_50
# checkpoints/salience_detr_resnet50_800_1333/train/2024-03-22-09_38_50/best_ap.pth
resume_from_checkpoint = None
learning_rate = 1e-4 # initial learning rate
optimizer = optim.AdamW(lr=learning_rate, weight_decay=1e-4, betas=(0.9, 0.999))
lr_scheduler = optim.lr_scheduler.MultiStepLR(milestones=[10], gamma=0.1)
# This define parameter groups with different learning rate
param_dicts = param_dict.finetune_backbone_and_linear_projection(lr=learning_rate)
</details>
📈Evaluation/Test
To evaluate a model with one or more GPUs, specify CUDA_VISIBLE_DEVICES
, dataset
, model
and checkpoint
.
CUDA_VISIBLE_DEVICES=<gpu_ids> accelerate launch test.py --coco-path /path/to/coco --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth
Optional parameters are as follows, see test.py for full parameters:
--show-dir
: path to save detection visualization results.--result
: specify a file to save detection numeric results, end with.json
.
To evaluate salience_detr_resnet50_800_1333
on coco
using 8 GPUs, save predictions to result.json
and visualize results to visualization/
:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch test.py
--coco-path data/coco \
--model-config configs/salience_detr/salience_detr_resnet50_800_1333.py \
--checkpoint https://github.com/xiuqhou/Salience-DETR/releases/download/v1.0.0/salience_detr_resnet50_800_1333_coco_1x.pth \
--result result.json \
--show-dir visualization/
</details>
<details>
<summary>Evaluate a json result file</summary>
To evaluate the json result file obtained above, specify the --result
but not specify --model
.
CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --coco-path /path/to/coco --result /path/to/result.json
Optional parameters, see test.py for full parameters:
--show-dir
: path to save detection visualization results.
▶︎Inference
Use inference.py
to perform inference on images. You should specify the image directory using --image-dir
.
python inference.py --image-dir /path/to/images --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth --show-dir /path/to/dir
<details>
<summary>An example for inference on an image folder</summary>
To performa inference for images under images/
and save visualizations to visualization/
:
python inference.py \
--image-dir images/ \
--model-config configs/salience_detr/salience_detr_resnet50_800_1333.py \
--checkpoint checkpoint.pth \
--show-dir visualization/
</details>
See inference.ipynb
for inference on single image and visualization.
🔁Benchmark a model
To test the inference speed, memory cost and parameters of a model, use tools/benchmark_model.py
.
python tools/benchmark_model.py --model-config configs/salience_detr/salience_detr_resnet50_800_1333.py
📍Train your own datasets
To train your own datasets, there are some things to do before training:
-
Prepare your datasets with COCO annotation format, and modify
coco_path
inconfigs/train_config.py
accordingly. -
Open model configs under
configs/salience_detr
and modify thenum_classes
to a number larger thanmax_category_id + 1
of your dataset. For example, from the following annotation ininstances_val2017.json
, we can find the maximum category_id is90
for COCO, so we setnum_classes = 91
.{"supercategory": "indoor","id": 90,"name": "toothbrush"}
You can simply set
num_classes
to a large enough number if not sure what to set. (For example,num_classes = 92
ornum_classes = 365
also work for COCO.) -
If necessary, modify other parameters in model configs under
configs/salience_detr
andtrain_config.py
.
📥Export an ONNX model
For advanced users who want to deploy our model, we provide a script to export an ONNX file.
python tools/pytorch2onnx.py \
--model-config /path/to/model.py \
--checkpoint /path/to/checkpoint.pth \
--save-file /path/to/save.onnx \
--simplify \ # use onnxsim to simplify the exported onnx file
--verify # verify the error between onnx model and pytorch model
For inference using the ONNX file, see ONNXDetector
in tools/pytorch2onnx.py
Reference
If you find our work helpful for your research, please consider citing:
@InProceedings{Hou_2024_CVPR,
author = {Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong},
title = {Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {17574-17583}
}
@inproceedings{hou2024relation,
title={Relation DETR: Exploring Explicit Position Relation Prior for Object Detection},
author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong and Lan, Xuguang},
booktitle={European conference on computer vision},
year={2024},
organization={Springer}
}