Home

Awesome

CLEVR-Dialog

This repository contains code for the paper:

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
[PDF] [ArXiv] [Code]
Oral Presentation
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019

If you find this code useful, consider citing our work:

@inproceedings{Kottur2019CLEVRDialog,
	title  = {CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog},  
	author = {Kottur, Satwik and Moura, Jos\'e M. F. and Parikh, Devi and   
	          Batra, Dhruv and Rohrbach, Marcus},  
	journal = {arXiv preprint arXiv:1903.03166},
	year   = {2019}  
}

Abstract

Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation of the 'state' of all images and dialogs.

We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset. This combination results in a dataset where all aspects of the visual dialog are fully annotated. In total, CLEVR-Dialog contains 5 instances of 10-round dialogs for about 85k CLEVR images, totaling to 4.25M question-answer pairs.

We use CLEVR-Dialog to benchmark performance of standard visual dialog models; in particular, on visual coreference resolution (as a function of the coreference distance). This is the first analysis of its kind for visual dialog models that was not possible without this dataset. We hope the findings from CLEVR-Dialog will help inform the development of future models for visual dialog.

CorefNMN This repository generates a version of our diagnostic dataset CLEVR-Dialog (figure above).

Setup

The code is in Python3 with following python package dependencies:

pip install absl-py
pip install json
pip install tqdm
pip install numpy

Directory Structure

The repository contains the following files:

In addition, the dataset generation code requires following files:

CLEVR Images

Our dataset is built on CLEVR images, which can be downloaded from here. Extract the images and scene JSON files in data/ folder. We will only use CLEVR train and val splits as scene JSON files are unavailable for test split.

Generating CLEVR-Dialog Dataset

To generate the dataset, please check run_me.sh. Additional details about the supported flags can be found in generate_dataset.py. An example command is shown below:

DATA_ROOT='data/CLEVR_v1.0/'
python -u generate_dataset.py \
	--scene_path=${DATA_ROOT}"scenes/CLEVR_train_scenes.json" \
	--num_beams=100 \
	--num_workers=1 \
	--save_path=${DATA_ROOT}"clevr_dialog_train_raw.json" \
	--num_images=10

CLEVR-Dialog Annotations

The generated JSON contains a list of dialogs on CLEVR images with following fields:

The dataset used in the paper can be downloaded here: train and val splits.

Contributors

For any questions, please feel free to contact the above contributor(s).

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree (here).