Home

Awesome

PACS: A Dataset for Physical Audiovisual Common-Sense Reasoning

This repository contains data and code for our paper PACS: A Dataset for Physical Audiovisual CommonSense Reasoning.

Sample Datapoints

Setting up the Repository

It is recommended to create an Anaconda environment:

conda create --name PACS python=3.8.11
conda activate PACS
pip install -r requirements.txt

Then, install the correct version of PyTorch, based on your cuda version here. For example:

pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

Dataset Download

The dataset is available for download here.

Alternatively, if you want to replicate the original download steps, you can run the following code (this will take a while):

cd dataset/scripts
python3 download.py -data_dir PATH_TO_DATA_STORAGE_HERE
python3 preprocess.py -data_dir PATH_TO_DATA_STORAGE_HERE

Baseline Models

To run baseline models, visit the experiments folder. We have currently benchmarked the following models:

ModelWith Audio (%)Without Audio (%)Δ
Fusion (I+A+V)51.9 ± 1.1--
Fusion (Q+I)-51.2 ± 0.8-
Fusion (Q+A)50.9 ± 0.6--
Fusion (Q+V)-51.5 ± 0.9-
Late Fusion55.0 ± 1.152.5± 1.62.5
CLIP/AudioCLIP60.0 ± 0.956.3 ± 0.73.7
UNITER (L)-60.6 ± 2.2-
Merlot Reserve (B)66.5 ± 1.464.0 ± 0.92.6
Merlot Reserve (L)70.1 ± 1.068.4 ± 0.71.8
Majority50.450.4-
Human96.3 ± 2.190.5 ± 3.15.9

Citation

If you used this repository or our dataset, please consider citing us:

@inproceedings{yu2022pacs,
  title={PACS: A Dataset for Physical Audiovisual CommonSense Reasoning},
  author={Yu, Samuel and Wu, Peter and Liang, Paul Pu and Salakhutdinov, Ruslan and Morency, Louis-Philippe},
  booktitle={European Conference on Computer Vision},
  year={2022}
}