Home

Awesome

OoD-Bench

OoD-Bench is a benchmark for both datasets and algorithms of out-of-distribution generalization. It positions datasets along two dimensions of distribution shift: diversity shift and correlation shift, unifying the disjoint threads of research from the perspective of data distribution. OoD algorithms are then evaluated and compared on two groups of datasets, each dominanted by one kind of the distribution shift. See our paper (CVPR 2022 oral) for more details.

This repository contains the code to produce the benchmark, which has two main components:

Environment requirements

export PYTHONPATH="$PYTHONPATH:$(pwd)/external/DomainBed/"
export PYTHONPATH="$PYTHONPATH:$(pwd)/external/wilds/"

Data preparation

Please follow this instruction.

Quantifying diversity and correlation shift

The quantification process consists of three main steps: (1) training an environment classifier, (2) extracting features from the trained classifier, and (3) measuring the shifts with the extracted features. The module ood_bench.scripts.main will handle the whole process for you. For example, to quantify the distribution shift between the training environments (indexed by 0 and 1) and the test environment (indexed by 2) of Colored MNIST with 16 trials, you can simply run:

python -m ood_bench.scripts.main\
       --n_trials 16\
       --data_dir /path/to/my/data\
       --dataset ColoredMNIST_IRM\
       --envs_p 0 1\
       --envs_q 2\
       --backbone mlp\
       --output_dir /path/to/store/outputs

In other cases where pretrained models are used, --pretrained_model_path must be specified. For models in torchvision model zoo, you can pass auto to the argument and the pretrained model will be downloaded automatically.

These two optional arguments are also useful:

Results

The following results are produced by the scripts under ood_bench/examples, all being automatically calibrated.

DatasetDiversity shiftCorrelation shift
PACS0.6715 ± 0.0392*0.0338 ± 0.0156*
Office-Home0.0657 ± 0.0147*0.0699 ± 0.0280*
Terra Incognita0.9846 ± 0.0935*0.0002 ± 0.0003*
DomainNet0.3740 ± 0.0343*0.1061 ± 0.0181*
WILDS-Camelyon0.9632 ± 0.19070.0000 ± 0.0000
Colored MNIST0.0013 ± 0.00060.5468 ± 0.0278
CelebA0.0031 ± 0.00170.1868 ± 0.0530
NICO0.0176 ± 0.01580.1968 ± 0.0888
ImageNet-A †0.0435 ± 0.01230.0222 ± 0.0192
ImageNet-R †0.1024 ± 0.01880.1180 ± 0.0311
ImageNet-V2 †0.0079 ± 0.00170.2362 ± 0.0607

<small>* averaged over all leave-out-domain-out splits     † with respect to the original ImageNet</small>

Note: there is some difference between the results shown above and those reported in our paper mainly because we reworked the original implementation to ease public use and to improve quantification stability. One of the main improvements is the use of calibration. Previously, the same thresholds that are empirically sound are used across all the datasets studied in our paper (but this may not hold for other datasets).

Extending OoD-Bench

class MyDataset(MultipleDomainDataset):
    ENVIRONMENTS = ['env0', 'env1']        # at least two environments
    def __init__(self, root, test_envs, hparams):
        super().__init__()

        # you may change the transformations below
        transform = get_transform()
        augment_scheme = hparams.get('data_augmentation_scheme', 'default')
        augment_transform = get_augment_transform(augment_scheme)

        self.datasets = []                 # required
        for i, env_name in enumerate(self.ENVIRONMENTS):
            if hparams['data_augmentation'] and (i not in test_envs):
                env_transform = augment_transform
            else:
                env_transform = transform
            # load the environments, not necessarily as ImageFolders;
            # you may write a specialized class to load them; the class
            # must possess an attribute named `samples`, a sequence of
            # 2-tuples where the second elements are the labels
            dataset = ImageFolder(Path(root, env_name), transform=env_transform)
            self.datasets.append(dataset)

        self.input_shape = (3, 224, 224,)  # required
        self.num_classes = 2               # required
class MyBackbone(Backbone):
    def __init__(self, hdim, pretrained_model_path=None):
        self._hdim = hdim
        super(MyBackbone, self).__init__(pretrained_model_path)

    @property
    def hdim(self):
        return self._hdim

    def _load_modules(self):
        self.modules_ = nn.Sequential(
            nn.Linear(3 * 14 * 14, self.hdim),
            nn.ReLU(True),
        )

    def forward(self, x):
        return self.modules_(x)

Benchmarking OoD algorithms

Please refer to this repository.

Citing

If you find the code useful or find our paper relevant to your research, please consider citing:

@inproceedings{ye2022ood,
    title={OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization},
    author={Ye, Nanyang and Li, Kaican and Bai, Haoyue and Yu, Runpeng and Hong, Lanqing and Zhou, Fengwei and Li, Zhenguo and Zhu, Jun},
    booktitle={CVPR},
    year={2022}
}