Awesome

Unsupervised Domain Adaptation for Object Detection (D-adapt)

Our code is available at TLlib examples for cross-domain object detection

Installation

Our code is based on Detectron latest(v0.6), please install it before usage.

The following is an example based on PyTorch 1.9.0 with CUDA 11.1. For other versions, please refer to the official website of PyTorch and Detectron.

# create environment
conda create -n detection python=3.8.3
# activate environment
conda activate detection
# install pytorch 
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
# install detectron
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
# install other requirements
pip install -r requirements.txt

Dataset

Following datasets can be downloaded automatically:

PASCAL_VOC 07+12
Clipart
WaterColor
Comic

You need to prepare following datasets manually if you want to use them:

Cityscapes, Foggy Cityscapes

Download Cityscapes and Foggy Cityscapes dataset from the link. Particularly, we use leftImg8bit_trainvaltest.zip for Cityscapes and leftImg8bit_trainvaltest_foggy.zip for Foggy Cityscapes.
Unzip them under the directory like

object_detction/datasets/cityscapes
├── gtFine
├── leftImg8bit
├── leftImg8bit_foggy
└── ...

Then run

python prepare_cityscapes_to_voc.py

This will automatically generate dataset in VOC format.

object_detction/datasets/cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages
object_detction/datasets/foggy_cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages

Sim10k

Download Sim10k dataset from the following links: Sim10k. Particularly, we use repro_10k_images.tgz , repro_image_sets.tgz and repro_10k_annotations.tgz for Sim10k.
Extract the training set from repro_10k_images.tgz, repro_image_sets.tgz and repro_10k_annotations.tgz, then rename directory VOC2012/ to sim10k/.

After preparation, there should exist following files:

object_detction/datasets/
├── VOC2007
│   ├── Annotations
│   ├──ImageSets
│   └──JPEGImages
├── VOC2012
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── clipart
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── watercolor
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── comic
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── cityscapes_in_voc
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── foggy_cityscapes_in_voc
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
└── sim10k
    ├── Annotations
    ├── ImageSets
    └── JPEGImages

Note: The above is a tutorial for using standard datasets. To use your own datasets, you need to convert them into corresponding format.

Supported Methods

Supported methods include:

Experiment and Results

The shell files give the script to reproduce the benchmarks with specified hyper-parameters. The basic training pipeline is as follows.

The following command trains a Faster-RCNN detector on task VOC->Clipart, with only source (VOC) data.

CUDA_VISIBLE_DEVICES=0 python source_only.py \
  --config-file config/faster_rcnn_R_101_C4_voc.yaml \
  -s VOC2007 datasets/VOC2007 VOC2012 datasets/VOC2012 -t Clipart datasets/clipart \
  --test VOC2007Test datasets/VOC2007 Clipart datasets/clipart --finetune \
  OUTPUT_DIR logs/source_only/faster_rcnn_R_101_C4/voc2clipart

Explanation of some arguments

--config-file: path to config file that specifies training hyper-parameters.
-s: a list that specifies source datasets, for each dataset you should pass in a (name, path) pair, in the above command, there are two source datasets VOC2007 and VOC2012.
-t: a list that specifies target datasets, same format as above.
--test: a list that specifiers test datasets, same format as above.

VOC->Clipart

		AP	AP50	AP75	aeroplane	bicycle	bird	boat	bottle	bus	car	cat	chair	cow	diningtable	dog	horse	motorbike	person	pottedplant	sheep	sofa	train	tvmonitor
Faster RCNN (ResNet101)	Source	14.9	29.3	12.6	29.6	38.0	24.7	21.7	31.9	48.0	30.8	15.9	32.0	19.2	18.2	12.1	28.2	48.8	38.3	34.6	3.8	22.5	43.7	44.0
	CycleGAN	20.0	37.7	18.3	37.1	41.9	29.9	26.5	40.9	65.1	37.8	23.8	40.7	48.9	12.7	14.4	27.8	63.0	55.1	40.1	8.0	30.7	54.1	55.7
	D-adapt	24.8	49.0	21.5	56.4	63.2	42.3	40.9	45.3	77.0	48.7	25.4	44.3	58.4	31.4	24.5	47.1	75.3	69.3	43.5	27.9	34.1	60.7	64.0

RetinaNet	Source	18.3	32.2	17.6	34.2	42.4	27.0	21.6	36.8	48.4	35.9	16.4	38.9	22.6	27.0	15.1	27.1	46.7	42.1	36.2	8.3	29.5	42.1	46.2
	D-adapt	25.1	46.3	23.9	47.4	65.0	33.1	37.5	56.8	61.2	55.1	27.3	45.5	51.8	29.1	29.6	38.0	74.5	66.7	46.0	24.2	29.3	54.2	53.8

VOC->WaterColor

	AP	AP50	AP75	bicycle	bird	car	cat	dog	person
Faster RCNN (ResNet101)	23.0	45.9	18.5	71.1	48.3	48.6	23.7	23.3	60.3
CycleGAN	24.9	50.8	22.4	75.8	52.1	49.8	30.1	33.4	63.6
D-adapt	28.5	57.5	23.6	77.4	54.0	52.8	43.9	48.1	68.9
Target	23.8	51.3	17.4	48.5	54.7	41.3	36.2	52.6	74.6

VOC->Comic

	AP	AP50	AP75	bicycle	bird	car	cat	dog	person
Faster RCNN (ResNet101)	13.0	25.5	11.4	33.0	15.8	28.9	16.8	19.6	39.0
CycleGAN	16.9	34.6	14.2	28.1	25.7	37.7	28.0	33.8	54.1
D-adapt	20.8	41.1	18.5	49.4	25.7	43.3	36.9	32.7	58.5
Target	21.9	44.6	16.0	40.7	32.3	38.3	43.9	41.3	71.0

Cityscapes->Foggy Cityscapes

		AP	AP50	AP75	bicycle	bus	car	motorcycle	person	rider	train	truck
Faster RCNN (VGG16)	Source	14.3	25.9	13.2	33.6	27.0	40.0	22.3	31.3	38.5	2.3	12.2
	CycleGAN	22.5	41.6	20.7	46.5	41.5	62.0	33.8	45.0	54.5	21.7	27.7
	D-adapt	19.4	38.1	17.5	42.0	36.8	58.1	32.2	43.1	51.8	14.6	26.3
	Target	24.0	45.3	21.3	45.9	47.4	67.3	39.7	49.0	53.2	30.0	29.6

Faster RCNN (ResNet101)	Source	18.8	33.3	19.0	36.1	34.5	43.8	24.0	36.3	39.9	29.1	22.8
	CycleGAN	22.9	41.8	21.9	42.0	44.5	57.6	36.3	40.9	48.0	30.8	34.3
	D-adapt	22.7	42.4	21.6	41.8	44.4	56.6	31.4	41.8	48.6	42.3	32.4
	Target	25.5	45.3	24.3	41.9	53.2	63.4	36.1	42.6	47.9	42.4	35.3

Sim10k->Cityscapes Car

		AP	AP50	AP75
Faster RCNN (VGG16)	Source	24.8	43.4	23.6
	CycleGAN	29.3	51.9	28.6
	D-adapt	23.6	48.5	18.7
	Target	24.8	43.4	23.6

Faster RCNN (ResNet101)	Source	24.6	44.4	23.0
	CycleGAN	26.5	47.4	24.0
	D-adapt	27.4	51.9	25.7
	Target	24.6	44.4	23.0

Visualization

We provide code for visualization in visualize.py. For example, suppose you have trained the source only model of task VOC->Clipart using provided scripts. The following code visualizes the prediction of the detector on Clipart.

CUDA_VISIBLE_DEVICES=0 python visualize.py --config-file config/faster_rcnn_R_101_C4_voc.yaml \
  --test Clipart datasets/clipart --save-path visualizations/source_only/voc2clipart \
  MODEL.WEIGHTS logs/source_only/faster_rcnn_R_101_C4/voc2clipart/model_final.pth

Explanation of some arguments

--test: a list that specifiers test datasets for visualization.
--save-path: where to save visualization results.
MODEL.WEIGHTS: path to the model.

TODO

Support methods: SWDA, Global/Local Alignment

Citation

If you use these methods in your research, please consider citing.

@inproceedings{jiang2021decoupled,
  title     = {Decoupled Adaptation for Cross-Domain Object Detection},
  author    = {Junguang Jiang and Baixu Chen and Jianmin Wang and Mingsheng Long},
  booktitle = {ICLR},
  year      = {2022}
}

@inproceedings{CycleGAN,
    title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
    author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
    booktitle={ICCV},
    year={2017}
}