

Dual Attention Networks for Visual Dialog

Pytorch Implementation for the paper:

Dual Attention Networks for Visual Reference Resolution in Visual Dialog <br> Gi-Cheon Kang, Jaeseo Lim, and Byoung-Tak Zhang <br> In EMNLP 2019

<!--![Overview of Dual Attention Networks](dan_overview.jpg)--> <img src="dan_overview.jpg" width="90%" align="middle">

If you use this code in your published research, please consider citing:

  title={Dual Attention Networks for Visual Reference Resolution in Visual Dialog},
  author={Kang, Gi-Cheon and Lim, Jaeseo and Zhang, Byoung-Tak},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
  pages = {2024--2033},

Setup and Dependencies

This starter code is implemented using PyTorch v0.3.1 with CUDA 8 and CuDNN 7. <br> It is recommended to set up this source code using Anaconda or Miniconda. <br>

  1. Install Anaconda or Miniconda distribution based on Python 3.6+ from their downloads' site.
  2. Clone this repository and create an environment:
git clone https://github.com/gicheonkang/DAN-VisDial
conda create -n dan_visdial python=3.6

# activate the environment and install all dependencies
conda activate dan_visdial
cd dan-visdial/
pip install -r requirements.txt

Download Features

  1. We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under $PROJECT_ROOT/data/{SPLIT_NAME}_feature directory. We need image_id to RCNN bounding box index file ({SPLIT_NAME}_imgid2idx.pkl) because the number of bounding box per image is not fixed (ranging from 10 to 100).
  1. Download the GloVe pretrained word vectors from here, and keep glove.6B.300d.txt under $PROJECT_ROOT/data/glove directory.

Data preprocessing & Word embedding initialization

# data preprocessing
cd DAN-VisDial/data/
python prepro.py

# Word embedding vector initialization (GloVe)
cd ../utils
python utils.py


Simple run

python train.py 

Saving model checkpoints

By default, our model save model checkpoints at every epoch. You can change it by using -save_step option.


Logging data checkpoints/start/time/log.txt shows epoch, loss, and learning rate.


Evaluation of a trained model checkpoint can be evaluated as follows:

python evaluate.py -load_path /path/to/.pth -split val

Validation scores can be checked in offline setting. But if you want to check the test split score, you have to submit a json file to online evaluation server. You can make json format with -save_ranks=True option.

Pre-trained model & Results

We provide the pre-trained model reported as the best single model in the paper. <br> To reproduce the results reported in the paper, please run the command below and submit the json file to online evaluation server.

python evaluate.py -load_path /path/to/dan_disc_epoch_12.pth -split test -use_gt False -save_ranks True

Performance on v1.0 test-std (trained on v1.0 train):



MIT License


This work was partly supported by the Korea government (2015-0-00310-SW.StarLab, 2017-0-01772-VTT, 2018-0-00622-RMI, 2019-0-01367-BabyMind, 10060086-RISF, P0006720-GENKO), and the ICT at Seoul National University.