Home

Awesome

MRT

Official implementation for paper: Masked Retraining Teacher-student Framework for Domain Adaptive Object Detection

By Zijing Zhao, Sitong Wei, Qingchao Chen, Dehui Li, Yifan Yang, Yuxin Peng and Yang Liu

The paper has been accepted by IEEE/CVF International Conference on Computer Vision (ICCV), 2023

method

Domain adaptive Object Detection (DAOD) leverages a labeled domain (source) to learn an object detector generalizing to a novel domain without annotation (target). Recent advances use a teacher-student framework, i.e., a student model is supervised by the pseudo labels from a teacher model. Though great success, they suffer from the limited number of pseudo boxes with incorrect predictions caused by the domain shift, misleading the student model to get sub-optimal results. To mitigate this problem, we propose Masked Retraining Teacher-student framework (MRT) which leverages masked autoencoder and selective retraining mechanism on detection transformer. Experiments show that our method outperforms existing approaches and achieves state-of-the-art on three domain adaptive object detection benchmarks.

We use Deformable DETR as the base detector. This code is built upon the original repository: https://github.com/fundamentalvision/Deformable-DETR, we thank for their excellent work.

1. Installation

1.1 Requirements

1.2 Compiling Deformable DETR CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

2. Usage

2.1 Data preparation

We provide the 2 benchmarks in our paper:

You can download the raw data from the official websites: cityscapes, foggy_cityscapes, sim10k, bdd100k. We provide the annotations that are converted into coco style, download from here and organize the datasets and annotations as following:

[data_root]
└─ cityscapes
	└─ annotations
		└─ cityscapes_train_cocostyle.json
		└─ cityscapes_train_caronly_cocostyle.json
		└─ cityscapes_val_cocostyle.json
		└─ cityscapes_val_caronly_cocostyle.json
	└─ leftImg8bit
		└─ train
		└─ val
└─ foggy_cityscapes
	└─ annotations
		└─ foggy_cityscapes_train_cocostyle.json
		└─ foggy_cityscapes_val_cocostyle.json
	└─ leftImg8bit_foggy
		└─ train
		└─ val
└─ sim10k
	└─ annotations
		└─ sim10k_train_cocostyle.json
		└─ sim10k_val_cocostyle.json
	└─ JPEGImages
└─ bdd10k
	└─ annotations
		└─ bdd100k_daytime_train_cocostyle.json
		└─ bdd100k_daytime_val_cocostyle.json
	└─ JPEGImages

To use additional datasets, you can edit datasets/coco_style_dataset.py and add key-value pairs to CocoStyleDataset.img_dirs and CocoStyleDataset.anno_files .

2.2 Training and evaluation

As has been discussed in implementation details in the paper, to save computation cost, our method is designed as a three-stage paradigm. We first perform source_only training which is trained standardly by labeled source domain. Then, we perform cross_domain_mae to train the model with MAE branch. Finally, we perform teaching which utilize a teacher-student framework with MAE branch and selective retraining.

For example, for city2foggy benchmark, first edit the files in configs/def-detr-base/city2foggy/ to specify your own DATA_ROOT and OUTPUT_DIR, then run:

sh configs/def-detr-base/city2foggy/source_only.sh
sh configs/def-detr-base/city2foggy/cross_domain_mae.sh
sh configs/def-detr-base/city2foggy/teaching.sh

We use tensorboard to record the loss and results. Run the following command to see the curves during training:

tensorboard --logdir=<YOUR/LOG/DIR>

To evaluate the trained model and get the predicted results, run:

sh configs/def-detr-base/city2foggy/evaluation.sh

3. Results and Model Parameters

We conduct all experiments with batch size 8 (for source_only stage, 8 labeled samples; for cross_domain_mae and MRT teaching stage, 8 labeled samples and 8 unlabeled samples), on 2 NVIDIA A100 GPUs.

city2foggy: cityscapes → foggy cityscapes(0.02)

backboneencoder layersdecoder layerstraining stageAP@50logs & weights
resnet5066source_only29.5logs & weights
resnet5066cross_domain_mae35.8logs & weights
resnet5066MRT teaching51.2logs & weights

sim2city: sim10k → cityscapes(car only)

backboneencoder layersdecoder layerstraining stageAP@50logs & weights
resnet5066source_only53.2logs & weights
resnet5066cross_domain_mae57.1logs & weights
resnet5066MRT teaching62.0logs & weights

city2bdd: cityscapes → bdd100k(daytime)

backboneencoder layersdecoder layerstraining stageAP@50logs & weights
resnet5066source_only29.6logs & weights
resnet5066cross_domain_mae31.1logs & weights
resnet5066MRT teaching33.7logs & weights

4. Citation

This repository is constructed and maintained by Zijing Zhao and Sitong Wei.

If you find our paper or project useful, please cite our work by the following BibTeX:

@inproceedings{zhao2023masked,
  title={Masked retraining teacher-student framework for domain adaptive object detection},
  author={Zhao, Zijing and Wei, Sitong and Chen, Qingchao and Li, Dehui and Yang, Yifan and Peng, Yuxin and Liu, Yang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={19039--19049},
  year={2023}
}

Thanks for your attention.