Awesome
ImaginaryNet: Learning Object Detectors without Real Images and Annotations
This repository is for the ICLR 2023 paper: ImaginaryNet: Learning Object Detectors without Real Images and Annotations
If you use any source codes or ideas included in this repository for your work, please cite the following paper.
<pre> @article{ni2022imaginarynet, title={ImaginaryNet: Learning Object Detectors without Real Images and Annotations}, author={Ni, Minheng and Huang, Zitong and Feng, Kailai and Zuo, Wangmeng}, journal={arXiv preprint arXiv:2210.06886}, year={2022} } </pre>If you have any questions, feel free to email me.
Abstract
Without the demand of training in reality, humans are able of detecting a new category of object simply based on the language description on its visual characteristics. Empowering deep learning with this ability undoubtedly enables the neural network to handle complex vision tasks, e.g., object detection, without collecting and annotating real images. To this end, this paper introduces a novel challenging learning paradigm Imaginary-Supervised Object Detection (ISOD), where neither real images nor manual annotations are allowed for training object detectors. To resolve this challenge, we propose ImaginaryNet, a framework to synthesize images by combining pretrained language model and text-to-image synthesis model. Given a class label, the language model is used to generate a full description of a scene with a target object, and the text-to-image model is deployed to generate a photo-realistic image. With the synthesized images and class labels, weakly supervised object detection can then be leveraged to accomplish ISOD. By gradually introducing real images and manual annotations, ImaginaryNet can collaborate with other supervision settings to further boost detection performance. Experiments show that ImaginaryNet can (i) obtain about 75% performance in ISOD compared with the weakly supervised counterpart of the same backbone trained on real data, (ii) significantly improve the baseline while achieving state-of-the-art or comparable performance by incorporating ImaginaryNet with other supervision settings.
Illustration of Framework
<img src="img/ImaginaryNet.png">Preparation
You can run the following commands to start up the environment.
conda env create -f environment.yaml
conda activate imaginarynet
pip install --upgrade jax==0.3.25 jaxlib==0.3.25+cuda11.cudnn82 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
conda install -c conda-forge cudatoolkit-dev
Pipeline Usage
This pipeline provide the core function of ImaginaryNet: to generate images based on class label.
Quick Start
python imaginarynet.py --num 10000 --classfile voc.txt --gpt --clip --backend dalle-mini
Parameters Explanation
- --seed Random seed.
- --num Number of generated images.
- --classfile Initial classes.
- --outputdir Output dir.
- --gpt Use GPT to extend prompt or not.
- --clip Use CLIP to filter image or not.
- --backend Use dalle-mini or stablediffusion.
- --cpu Use CLIP as filter on CPU or not.
- --threshold The min score CLIP can accept.
Reproducibility
To help improve the reproducibility of the community, we provide generated datasets, trained checkpoints, and training logs. Please note that generated images may not be re-generated exactly the same because of the update of the backend and the change of the environment. We did not modify the code of detection backbones. To start training of these backbones, please refer to their original repos. If you want to access the original data or experiments, please download our archives.
Generated Images
Name | Download Link |
---|---|
10,000 Imaginary Data | Download |
Save Checkpoints and Logs
Imaginary-Supervised Object Detection (ISOD)
Backbone | Imaginary Data | mAP | Checkpoint | Log |
---|---|---|---|---|
OICR | 5K Imaginary | 35.43 | Download | Download |
Weakly-Supervised Object Detection (WSOD)
Backbone | Imaginary Data | mAP | Checkpoint | Log |
---|---|---|---|---|
WSDDN | 5K Imaginary | 39.90 | Download | Download |
OICR | 5K Imaginary | 51.39 | Download | Download |
W2N | 5K Imaginary | 65.05 | Download | Download |
Semi-Supervised Object Detection (SSOD)
Backbone | Real Data | Imaginary Data | mAP | Checkpoint | Log |
---|---|---|---|---|---|
Unbiased-Teacher | 5K VOC2007 | 5K Imaginary | 80.36 | Download | Download |
Unbiased-Teacher | 5K VOC2007 | 10K Imaginary | 80.60 | Download | Download |
Unbiased-Teacher | 5K VOC2007 + 10K VOC2012 (un-labeled) | 10K Imaginary | 81.60 | Download | Download |
Acknowledgement
We greatly appreciate Yeli Shen for his contribution in the public code of ImaginaryNet.