Home

Awesome

Spawrious

Leaderboard results

One-to-one Spurious Correlations

gif

Many-to-many Spurious Correlations

gif

Spawrious is a challenging OOD image classification benchmark (link to paper). It consists of 6 separate OOD challenges split into two types: one-to-one and many-to-many spurious correlation challenges.

The dataset contains images of 4 dog breeds, found in 6 locations. The entire dataset consists of ~152,000 images, but each challenge only requires a subset of this. As a result, the repo allows users to only download the mimimal dataset required for a given spawrious challenge.

Example script

Datasets take the following names:

Running the command below retrieves the appropriate dataset at a user specified user directory (and downloads the dataset if not available), trains a resnet18, and evaluates the results on the OOD test set.

python example.py --data_dir <path to data dir> --dataset <one of the list above>

Installation

pip install git+https://github.com/aengusl/spawrious.git

HParams

Using the datasets

from spawrious.torch import get_spawrious_dataset
# spawrious.tf if using tensorflow or jax

dataset = "m2m_medium"
data_dir = ".data/"
val_split = 0.2

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
spawrious = get_spawrious_dataset(dataset_name=dataset, root_dir=data_dir)
train_set = spawrious.get_train_dataset()
test_set = spawrious.get_test_dataset()
val_size = int(len(train_set) * val_split)
train_set, val_set = torch.utils.data.random_split(
    train_set, [len(train_set) - val_size, val_size]
)

Click to download the datasets:

Generate your own data

If you want to generate your own data, or understand how we generated ours, take a look at generate_dataset.py. To run this file, you additionally need to install diffusers and transformers.

Citation

@misc{lynch2023spawrious,
      title={Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases}, 
      author={Aengus Lynch and Gbètondji J-S Dovonon and Jean Kaddour and Ricardo Silva},
      year={2023},
      eprint={2303.05470},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Licensing

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0