Home

Awesome

Unsupervised Domain Adaptation for Object Detection (D-adapt)

Our code is available at TLlib examples for cross-domain object detection

Installation

Our code is based on Detectron latest(v0.6), please install it before usage.

The following is an example based on PyTorch 1.9.0 with CUDA 11.1. For other versions, please refer to the official website of PyTorch and Detectron.

# create environment
conda create -n detection python=3.8.3
# activate environment
conda activate detection
# install pytorch 
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
# install detectron
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
# install other requirements
pip install -r requirements.txt

Dataset

Following datasets can be downloaded automatically:

You need to prepare following datasets manually if you want to use them:

Cityscapes, Foggy Cityscapes

object_detction/datasets/cityscapes
├── gtFine
├── leftImg8bit
├── leftImg8bit_foggy
└── ...

Then run

python prepare_cityscapes_to_voc.py 

This will automatically generate dataset in VOC format.

object_detction/datasets/cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages
object_detction/datasets/foggy_cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages

Sim10k

After preparation, there should exist following files:

object_detction/datasets/
├── VOC2007
│   ├── Annotations
│   ├──ImageSets
│   └──JPEGImages
├── VOC2012
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── clipart
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── watercolor
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── comic
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── cityscapes_in_voc
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── foggy_cityscapes_in_voc
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
└── sim10k
    ├── Annotations
    ├── ImageSets
    └── JPEGImages

Note: The above is a tutorial for using standard datasets. To use your own datasets, you need to convert them into corresponding format.

Supported Methods

Supported methods include:

Experiment and Results

The shell files give the script to reproduce the benchmarks with specified hyper-parameters. The basic training pipeline is as follows.

The following command trains a Faster-RCNN detector on task VOC->Clipart, with only source (VOC) data.

CUDA_VISIBLE_DEVICES=0 python source_only.py \
  --config-file config/faster_rcnn_R_101_C4_voc.yaml \
  -s VOC2007 datasets/VOC2007 VOC2012 datasets/VOC2012 -t Clipart datasets/clipart \
  --test VOC2007Test datasets/VOC2007 Clipart datasets/clipart --finetune \
  OUTPUT_DIR logs/source_only/faster_rcnn_R_101_C4/voc2clipart

Explanation of some arguments

VOC->Clipart

APAP50AP75aeroplanebicyclebirdboatbottlebuscarcatchaircowdiningtabledoghorsemotorbikepersonpottedplantsheepsofatraintvmonitor
Faster RCNN (ResNet101)Source14.929.312.629.638.024.721.731.948.030.815.932.019.218.212.128.248.838.334.63.822.543.744.0
CycleGAN20.037.718.337.141.929.926.540.965.137.823.840.748.912.714.427.863.055.140.18.030.754.155.7
D-adapt24.849.021.556.463.242.340.945.377.048.725.444.358.431.424.547.175.369.343.527.934.160.764.0
RetinaNetSource18.332.217.634.242.427.021.636.848.435.916.438.922.627.015.127.146.742.136.28.329.542.146.2
D-adapt25.146.323.947.465.033.137.556.861.255.127.345.551.829.129.638.074.566.746.024.229.354.253.8

VOC->WaterColor

APAP50AP75bicyclebirdcarcatdogperson
Faster RCNN (ResNet101)23.045.918.571.148.348.623.723.360.3
CycleGAN24.950.822.475.852.149.830.133.463.6
D-adapt28.557.523.677.454.052.843.948.168.9
Target23.851.317.448.554.741.336.252.674.6

VOC->Comic

APAP50AP75bicyclebirdcarcatdogperson
Faster RCNN (ResNet101)13.025.511.433.015.828.916.819.639.0
CycleGAN16.934.614.228.125.737.728.033.854.1
D-adapt20.841.118.549.425.743.336.932.758.5
Target21.944.616.040.732.338.343.941.371.0

Cityscapes->Foggy Cityscapes

APAP50AP75bicyclebuscarmotorcyclepersonridertraintruck
Faster RCNN (VGG16)Source14.325.913.233.627.040.022.331.338.52.312.2
CycleGAN22.541.620.746.541.562.033.845.054.521.727.7
D-adapt19.438.117.542.036.858.132.243.151.814.626.3
Target24.045.321.345.947.467.339.749.053.230.029.6
Faster RCNN (ResNet101)Source18.833.319.036.134.543.824.036.339.929.122.8
CycleGAN22.941.821.942.044.557.636.340.948.030.834.3
D-adapt22.742.421.641.844.456.631.441.848.642.332.4
Target25.545.324.341.953.263.436.142.647.942.435.3

Sim10k->Cityscapes Car

APAP50AP75
Faster RCNN (VGG16)Source24.843.423.6
CycleGAN29.351.928.6
D-adapt23.648.518.7
Target24.843.423.6
Faster RCNN (ResNet101)Source24.644.423.0
CycleGAN26.547.424.0
D-adapt27.451.925.7
Target24.644.423.0

Visualization

We provide code for visualization in visualize.py. For example, suppose you have trained the source only model of task VOC->Clipart using provided scripts. The following code visualizes the prediction of the detector on Clipart.

CUDA_VISIBLE_DEVICES=0 python visualize.py --config-file config/faster_rcnn_R_101_C4_voc.yaml \
  --test Clipart datasets/clipart --save-path visualizations/source_only/voc2clipart \
  MODEL.WEIGHTS logs/source_only/faster_rcnn_R_101_C4/voc2clipart/model_final.pth

Explanation of some arguments

TODO

Support methods: SWDA, Global/Local Alignment

Citation

If you use these methods in your research, please consider citing.

@inproceedings{jiang2021decoupled,
  title     = {Decoupled Adaptation for Cross-Domain Object Detection},
  author    = {Junguang Jiang and Baixu Chen and Jianmin Wang and Mingsheng Long},
  booktitle = {ICLR},
  year      = {2022}
}

@inproceedings{CycleGAN,
    title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
    author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
    booktitle={ICCV},
    year={2017}
}