Home

Awesome

SIGMA: Semantic-complete Graph Matching For Domain Adaptive Object Detection (CVPR-22 ORAL)

[Arxiv] [ηŸ₯乎]

By Wuyang Li

πŸŽ‰News! If you feel that your DAOD research has hit a bottleneck, welcome to check out our latest work on Adaptive Open-set Object Detection, which extends the target domain to the open set!

Three branches of the project:

The features of SIGMA++:

image

SIGMA++ has found its final home now, indicating the end of this series of works. The growth of SIGMA++ is full of frustration: πŸ‘Ά ➑ πŸ§’.

SCAN ➑ SCAN++ ➑ SIGMA ➑ SIGMA++

The main idea of the series of works: Model fine-grained feature points with graphs. We sincerely appreciate for all the readers showing interest in our works.

Honestly, due to the limited personal ability, our works still have many limitations, e.g., sub-optimal and redundant designs. Please forgive me. Nevertheless, we hope our works can inspire lots of good idea.

Best regards,
Wuyang Li
E-mail: wuyangli2-c@my.cityu.edu.hk

πŸ’‘ Preparation

Installation

Check INSTALL.md for installation instructions. If you have any problem, feel free to screenshot your issue for me. Thanks.

Data preparation

More detailed preparation instructions are available here.

Step 1: Prepare benchmark datasets.

We follow EPM to construct the training and testing set by three following settings. Annotation files are available at onedrive.

Cityscapes -> Foggy Cityscapes

<!-- - (https://drive.google.com/file/d/1LRNXW2Wee8tjuxc5gjVsFQv49vA_SBtk/view?usp=sharing). -->

Sim10k -> Cityscapes (class car only)

<!-- - (https://drive.google.com/file/d/1LRNXW2Wee8tjuxc5gjVsFQv49vA_SBtk/view?usp=sharing). -->

KITTI -> Cityscapes (class car only)

[DATASET_PATH]
└─ Cityscapes
   └─ cocoAnnotations
   └─ leftImg8bit
      └─ train
      └─ val
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ KITTI
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages
└─ Sim10k
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages

Step 2: change the data root for your dataset at paths_catalog.py.

DATA_DIR = [$Your dataset root]

πŸ“¦ Well-trained models

The ImageNet pretrained VGG-16 backbone (w/o BN) is available at link. You can use it if you cannot download the model through the link in the config file.
The well-trained models are available at this link).

  1. We can get higher results than the reported ones with tailor-tuned hyperparameters.
  2. E2E indicates end-to-end training for better reproducibility. Our config files are used for end-to-end training.
  3. Two-stage/ longer training and turning learning rate will make the results more stable and get higer mAP/AP75.
  4. After correcting a default hyper-parameter (as explained in the config file), Sim10k to City achieves better results than the reported ones.
  5. You can set MODEL.MIDDLE_HEAD.GM.WITH_CLUSTER_UPDATE False to accelerate training greatly with ignorable performance drops. You'd better also make this change for bs=2 since we found it more friendly for the small batch-size training.
  6. Results will be stable after the learning rate decline (in the training schedule).
SourceTargetE2EMetricBackbonemAPAP@50AP@75file
CityFoggyCOCOV-1624.043.623.8city_to_foggy_vgg16_43.58_mAP.pth
CityFoggyCOCOV-1624.343.922.6city_to_foggy_vgg16_43.90_mAP.pth
CityFoggy$\checkmark$COCOV-1622.043.521.8reproduced
CityFoggyCOCOR-5022.744.321.2city_to_foggy_res50_44.26_mAP.pth
CityBDD100kCOCOV-16-32.7-city_to_bdd100k_vgg16_32.65_mAP.pth
Sim10kCityCOCOV-1633.457.133.8sim10k_to_city_vgg16_53.73_mAP.pth
Sim10kCity$\checkmark$COCOV-1632.155.232.1reproduced
KITTICityCOCOV-1622.646.620.0kitti_to_city_vgg16_46.45_mAP.pth

πŸ”₯ Get start

NOTE: In the code comments, there is a small correction about batchsize: IMS_PER_BATACH=4 indicates 4 images per domain.

Train the model from the scratch with the default setting (batchsize = 4):

python tools/train_net_da.py \
        --config-file configs/SIGMA/xxx.yaml \

Test the well-trained model:

python tools/test_net.py \
        --config-file configs/SIGMA/xxx.yaml \
        MODEL.WEIGHT well_trained_models/xxx.pth

For example: test cityscapes to foggy cityscapes with VGG16 backbone.

python tools/test_net.py \
         --config-file configs/SIGMA/sigma_vgg16_cityscapace_to_foggy.yaml \
         MODEL.WEIGHT well_trained_models/city_to_foggy_vgg16_43.58_mAP.pth

✨ Quick Tutorials

  1. See sigma_vgg16_cityscapace_to_foggy.yaml to understand APIs.
  2. We modify the trainer to meet the requirements of SIGMA.
  3. GM is integrated in this "middle layer": graph_matching_head.
  4. Node sampling is conducted together with fcos loss: loss.

πŸ“ Citation

If you think this work is helpful for your project, please give it a star and citation. We sincerely appreciate for your acknowledgments.

@inproceedings{li2022sigma,
  title={SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection},
  author={Li, Wuyang and Liu, Xinyu and Yuan, Yixuan},
  booktitle={CVPR},
  year={2022}
}

Relevant project:

@inproceedings{li2022scan,
  title={SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation},
  author={Li, Wuyang and Liu, Xinyu and Yao, Xiwen and Yuan, Yixuan},
  booktitle={AAAI},
  year={2022}
}

🀞 Acknowledgements

We mainly appreciate for these good projects and their authors' hard-working.

πŸ“’ Abstract

Domain Adaptive Object Detection (DAOD) leverages a labeled source domain to learn an object detector generalizing to a novel target domain free of annotations. Recent advances align class-conditional distributions through narrowing down cross-domain prototypes (class centers). Though great success, these works ignore the significant within-class variance and the domain-mismatched semantics within the training batch, leading to a sub-optimal adaptation. To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching. Specifically, we design a Graph-embedded Semantic Completion module (GSC) that completes mismatched semantics through generating hallucination graph nodes in missing categories. Then, we establish cross-image graphs to model class-conditional distributions and learn a graph-guided memory bank for better semantic completion in turn. After representing the source and target data as graphs, we reformulate the adaptation as a graph matching problem, i.e., finding well-matched node pairs across graphs to reduce the domain gap, which is solved with a novel Bipartite Graph Matching adaptor (BGM). In a nutshell, we utilize graph nodes to establish semantic-aware node affinity and leverage graph edges as quadratic constraints in a structure-aware matching loss, achieving fine-grained adaptation with a node-to-node graph matching. Extensive experiments demonstrate that our method outperforms existing works significantly.

image