Home

Awesome

<div align="center"> <h1>Can OOD Object Detectors Learn from Foundation Models?</h1> <div> <a href="https://github.com/jliu-ac" target="_blank">Jiahui Liu</a>,</span> <a href="" target="_blank">Xin Wen</a>,</span> <a href="https://github.com/Shizhen-ZHAO" target="_blank">Shizhen Zhao</a>,</span> <a href="" target="_blank">Yingxian Chen</a>,</span> <a href="https://xjqi.github.io/" target="_blank">Xiaojuan Qi</a><sup>&#8224</sup> </div> <div> The University of Hong Kong&emsp; </div> <div> &#8224 corresponding author </div>

European Conference on Computer Vision (ECCV) 2024

<img src="pages/figure_main.gif" width="85%"/> </div>

Quick Guide

This repository contains code of SyncOOD in two parts:

Abstract

Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data. Inspired by recent advancements in text-to-image generative models, such as Stable Diffusion, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples, thereby enhancing OOD object detection. We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models to automatically extract meaningful OOD data from text-to-image generative models. This offers the model access to open-world knowledge encapsulated within off-the-shelf foundation models. The synthetic OOD samples are then employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution (ID)/OOD decision boundaries. Extensive experiments across multiple benchmarks demonstrate that SyncOOD significantly outperforms existing methods, establishing new state-of-the-art performance with minimal synthetic data usage.

Key Contributions

Citation

If you find this work is useful, please consider citing:

@inproceedings{liu2025can,
  title={Can OOD Object Detectors Learn from Foundation Models?},
  author={Liu, Jiahui and Wen, Xin and Zhao, Shizhen and Chen, Yingxian and Qi, Xiaojuan},
  booktitle={European Conference on Computer Vision},
  pages={213--231},
  year={2025},
  organization={Springer}
}

Acknowledgements


<span id="Train-an-OOD-detector"></span>

Train an OOD Detector

We utilize synthetic Out-of-Distribution(OOD) samples and original In-Distribution(ID) samples to train a lightweight, plug-and-play OOD detector in a very efficient way, achieving state-of-the-art OOD object detection. </br>We mainly conduct the experiments on Ubuntu 20.04 with GeForce RTX 3090 GPUs.

<span id="Train-env"></span>

1. Environment Setup

We mainly use Conda for installation and provide the enviroment files in requirements.txt and requirements.yml, you can choose one file for setting up the environment (we use the similar environment with Du et al (ICLR 2022) and Wilson et al (ICCV 2023)).

<span id="Train-data"></span>

2. Datasets

Original Data

Here you should prepare:

Download all the datasets (following the Dataset Preparation of the benckmark) in your own pre-defined data root path: DATASET_DIR. Your dataset structure should follow:

└── DATASET_DIR
	└── VOC_0712_converted
		|
		├── JPEGImages
		├── voc0712_train_all.json
		└── val_coco_format.json
	└── COCO
		|
		├── annotations
			├── xxx.json (the original json files)
			├── instances_val2017_ood_wrt_bdd_rm_overlap.json
			└── instances_val2017_ood_rm_overlap.json
		├── train2017
		└── val2017
	└── bdd100k
		|
		├── images
		├── val_bdd_converted.json
		└── train_bdd_converted.json
	└── OpenImages
		|
		├── coco_classes
		└── ood_classes_rm_overlap

Synthetic Data

Here you should prepare two synthetic OOD datasets for training:

Here you can download our processed demo_datasets from our DataPage, </br> or synthesis and perpare your own synthetic OOD data with the pipeline of Synthesize Novel Samples. </br> Your dataset structure should be updated as:

└── DATASET_DIR
	└── VOC_0712_converted
	└── COCO
	└── bdd100k
	└── OpenImages
 	└── SyncOOD_VOC
		|
		├── images
		└── info_raw.json
	└── SyncOOD_BDD
		|
		├── images
		└── info_raw.json

Data Pre-processing

Now you should pre-process the synthetic data infomation with your pre-defined data root path DATASET_DIR. Ensure we are in the path (from the root path: SyncOOD/) of data tools:

cd ./tools

Run the script with DATASET_DIR:

python align_ood_info.py --dataroot DATASET_DIR

When finishing, your dataset structure should be updated as:

└── DATASET_DIR
	└── VOC_0712_converted
	└── COCO
	└── bdd100k
	└── OpenImages
 	└── SyncOOD_VOC
		|
		├── images
		├── info_raw.json
		└── info.json
	└── SyncOOD_BDD
		|
		├── images
		├── info_raw.json
		└── info.json

<span id="Train-base"></span>

3. Base Object Detectors

Detector Checkpoints

We are training a plug-and-play OOD detector with off-the-shelf base object detectors (Faster R-CNN and VOS). </br> Here you can follow VOS repository to train your own base object detectors, </br> or download our well-trained base_detectors checkpoints from our DataPage.

Save all the checkpoints in a flexible detector root path: DETECTOR_DIR and follow the structure as:

└── DETECTOR_DIR
	└── frcnn_voc.pth
	└── vos_voc.pth
	└── frcnn_bdd.pth
	└── vos_bdd.pth

Detector Configs

Ensure we are in the path (from the root path: SyncOOD/) of base detectors:

cd ./detection/configs

Here please ensure which base object detector you would like to use (Faster R-CNN or VOS):

<span id="Train-feat"></span>

4. Feature Extraction

Feature extraction may consume a lot of disk space and memory, especially on the BDD100K dataset. </br> If you are using the checkpoints provided from our base_detectors, here you can download our extracted_features from our DataPage into the updated dataset structure and skip this step, </br> or follow the instructions to extract your own features:

Ensure we are in the path (from the root path: SyncOOD/) of feature extraction:

cd ./OOD_OBJ_DET

Firstly we extract ID features from original ID samples:

sh feature_extraction_id.sh

Then we extract OOD features from synthetic OOD samples:

sh feature_extraction_ood.sh

<span id="Train-feat-data"></span> Finally your updated dataset structure should be:

└── DATASET_DIR
	└── VOC_0712_converted
	└── COCO
	└── bdd100k
	└── OpenImages
 	└── SyncOOD_VOC
	└── SyncOOD_BDD
	└── VOC_features
		|
		├── VOC-RCNN-RN50-id.hdf5
		└── VOC-RCNN-RN50-ood.hdf5
	└── BDD_features
		|
		├── BDD-RCNN-RN50-id.hdf5
		└── BDD-RCNN-RN50-ood.hdf5

<span id="Train-train"></span>

5. Training the OOD detector

Ensure we are in the path (from the root path: SyncOOD/) of OOD detector training:

cd ./OOD_OBJ_DET

Then train an OOD detector:

sh train.sh

The obtained OOD detector checkpoints are saved together with the extracted features, so the current data structure is:

└── DATASET_DIR
	└── VOC_0712_converted
	└── COCO
	└── bdd100k
	└── OpenImages
 	└── SyncOOD_VOC
	└── SyncOOD_BDD
	└── VOC_features
		|
		├── VOC-RCNN-RN50-id.hdf5
		├── VOC-RCNN-RN50-ood.hdf5
		└── VOC-RCNN-RN50-mlp.pth
	└── BDD_features
		|
		├── BDD-RCNN-RN50-id.hdf5
		├── BDD-RCNN-RN50-ood.hdf5
		└── BDD-RCNN-RN50-mlp.pth

<span id="Train-inf"></span>

6. Evaluation

Ensure we are in the path (from the root path: SyncOOD/) of evaluating the obtained OOD detectors:

sh evaluation.sh

Finally you can get FPR95, AUROC, and AUPR of your OOD detector on two OOD dataset.


<span id="Synthesize-Novel-Samples"></span>

Synthesize Novel Samples

We aim to develop an automatic, transparent, controllable, and low-cost pipeline for synthesizing scene-level images containing novel objects and provide coco-format annotations to help 1) training OOD detectors and 2) exploring more general open-world tasks (comming soon).