Awesome

On_the_Utility_of_Synthetic_Data

Code release for "A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation", accepted by CVPR2023.

Project Page $\cdot$ PDF Download $\cdot$ Dataset Download

One can get access to our datasets via Google Drive (above) or Baidu Netdisk (below).

Requirements

python 3.8.8
pytorch 1.8.0
torchvision 0.9.0

Data preparation

Existing datasets. The references of the used existing datasets (e.g., ShapeNet, VisDA-2017, ImageNet, MetaShift, and ObjectNet) are included in the paper.
Our new datasets. One can download them from the link above; otherwise, download them from the link below. SynSL is here, our synthesized 12.8M images of 10 classes (for supervised learning, termed SynSL); SynSL-120K is here, including our synthesized 120K images of 10 classes (train), train+SubImageNet, and three types of test data (i.e., test_IID, test_IID_wo_BG, and test_OOD); S2RDA is here, including two challenging transfer tasks of S2RDA-49 and S2RDA-MS-39. Please refer to the paper for more details.
The validation and test splits of the real domains in S2RDA are also provided in this repository.

Model training

Install necessary python packages.
Replace data paths in run.sh with those in one's own system.
Run command sh run.sh.

The results are saved in the folder ./checkpoints/.

Pre-trained checkpoints

The pre-trained checkpoints for downstream synthetic-to-real classification adaptation are available here or there, and they are obtained at the last pre-training iteration.

Paper citation

@InProceedings{tang2023new,
    author    = {Tang, Hui and Jia, Kui},
    title     = {A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
}