Awesome
SCAN: Cross-domain Object Detection with Semantic Conditioned Adaptation (AAAI22 ORAL)
By Wuyang Li
[2022/03/08/] Welcome to follow our new work SIGMA, which is a comprehensive upgrade of this work (SCAN).
Installation
Check INSTALL.md for installation instructions.
Data preparation
Step 1: Format three benchmark datasets.
[DATASET_PATH]
└─ Cityscapes
└─ cocoAnnotations
└─ leftImg8bit
└─ train
└─ val
└─ leftImg8bit_foggy
└─ train
└─ val
└─ KITTI
└─ Annotations
└─ ImageSets
└─ JPEGImages
└─ Sim10k
└─ Annotations
└─ ImageSets
└─ JPEGImages
Step 2: change the data root for your dataset at paths_catalog.py.
DATA_DIR = [$Your dataset root]
More detailed dataset preparation can be found at EPM.
Tutorials for this project
We present basic instructions about our main modification to understand our codes better.
-
Middle_head: congraph
- We design a "middle head" between the feature extractor and detection head for different DA operations on feature maps.
- We give lots of APIs for further research, including different kinds of graphs, manifestation modules, paradigms, and semantic transfer settings, and you can use them by changing the config file directly, (more details are shown in 'fcos_core/config/default.py')
-
Node generation: here
- We sample graph nodes with ground-truth in the source domain and use DBSCAN to sample target domain nodes.
- We have tried different clustering algorithms for target node sampling and preserving the APIs.
-
An interesting inference strategy here
- We find that ensembling the semantic maps (the outputs of semantic conditioned kernels) and the classification maps can achieve a higher result (C2F: 42.3 to 42.8). You can have a try by changing the TEST.MODE from 'common' to 'precision' in the config file.
- Besides, only using the semantic maps can achieve a comparable result with the standard 4-Conv detection head and reduce computation costs (TEST.MODE =' light'). Kindly note that we still use the 'common' mode for a fair comparison with other methods.
-
CKA module is implemented here
-
DEBUGGGG
- We also preserve may debug APIs to save different maps for a better understanding of our works.
Well-trained models
We provide the experimental results and model weights in this section (onedrive line). Kindly note that it is easy to get higher results than the reported ones with tailor-tuned hyperparameters.
dataset | backbone | mAP | mAP@50 | mAP@75 |
---|---|---|---|---|
Cityscapes -> Foggy Cityscapes | VGG16 | 23.0 | 42.3 | 21.2 |
Sim10k -> Cityscapes | VGG16 | 27.4 | 53.0 | 27.4 |
KITTI -> Cityscapes | VGG16 | 23.0 | 46.3 | 20.9 |
Get start
Train from the scratch: (Use VGG-16 as the backbone with 1 GPU. Our code doesn't support distributed training now and only supports single-GPU training.)
python tools/train_net_da.py \
--config-file configs/scan/xxx.yaml
Test with the well trained models:
python tools/test_net.py \
--config-file configs/scan/xxx.yaml \
MODEL.WEIGHT xxx.pth
Citation
If you think this work is helpful for your project, please give it a star and citation:
@inproceedings{li2022scan,
title={SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation},
author={Li, Wuyang and Liu, Xinyu and Yao, Xiwen and Yuan, Yixuan},
booktitle={36th AAAI Conference on Artificial Intelligence (AAAI-22)},
year={2022}
}
Acknowledgements
This work is based on the EveryPixelMatter (ECCV20) EPM.
The implementation of the detector is heavily based on FCOS.
Abstract
The domain gap severely limits the transferability and scalability of object detectors trained in a specific domain when applied to a novel one. Most existing works bridge the domain gap through minimizing the domain discrepancy in the category space and aligning category-agnostic global features. Though great success, these methods model domain discrepancy with prototypes within a batch, yielding a biased estimation of domain-level distribution. Besides, the category-agnostic alignment leads to the disagreement of class-specific distributions in the two domains, further causing inevitable classification errors. To overcome these two challenges, we propose a novel Semantic Conditioned AdaptatioN (SCAN) framework such that well-modeled unbiased semantics can support semantic conditioned adaptation for precise domain adaptive object detection. Specifically, class-specific semantics crossing different images in the source domain are graphically aggregated as the input to learn an unbiased semantic paradigm incrementally. The paradigm is then sent to a lightweight manifestation module to obtain conditional kernels to serve as the role of extracting semantics from the target domain for better adaptation. Subsequently, conditional kernels are integrated into global alignment to support the class-specific adaptation in a designed Conditional Kernel guided Alignment (CKA) module. Meanwhile, rich knowledge of the unbiased paradigm is transferred to the target domain with a novel Graph-based Semantic Transfer (GST) mechanism, yielding the adaptation in the category-based feature space. Comprehensive experiments conducted on three adaptation benchmarks demonstrate that SCAN outperforms existing works by a large margin.
Contact
If you have any problems, please feel free to contact me at wuyangli2-c@my.cityu.edu.hk. Thanks.