Home

Awesome

MT-EQA

Multi-Target Embodied Question Answering

Introduction

We present a generalization of EQA -- Multi-Target EQA (MT-EQA). Specifically, we study questions that have multiple targets in them, such as Is the dresser in the bedroom bigger than the oven in the kitchen?", where the agent has to navigate to multiple locations (dresser in bedroom", oven in kitchen") and perform comparative reasoning (dresser" bigger than ``oven") before it can answer a question. Such questions require the development of entirely new modules or components in the agent. To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module. The program generator converts the given question into sequential executable sub-programs; the navigator guides the agent to multiple locations pertinent to the navigation-related sub-programs; and the controller learns to select relevant observations along its path. These observations are then fed to the VQA module to predict the answer.

<p align="center"> <img src="http://www.cs.unc.edu/~licheng/images/cvpr19_mteqa.png" width="75%"/> </p>

Citation

@inproceedings{yu2019mteqa,
  title={Multi-Target Embodied Question Answering},
  author={Yu, Licheng and Chen, Xinlei and Gkioxari, Georgia and Bansal, Mohit and Berg, Tamara L and Batra, Dhruv},
  booktitle={CVPR},
  year={2019}
}

Data Generation

Go to eqa_data folder and do the followings:

  1. How to generate question-answer pairs
  1. Generate graphs, connMaps, and shortest-paths
  1. For installing House3D

Imitation Learning for Nav+Ctrl+cVQA

Go to nav_loc_vqa folder and do the followings:

  1. Prepare House Data (conn-maps, graphs, shortest-paths, images, features, etc)
  1. Train and Eval IL
  1. Evaluate RL-finetuned Model (after checking eqa_nav)

Reinforcement Learning Finetuning for Navigators

Go to eqa_nav folder and do the followings:

  1. Prepare House Data (conn-maps, graphs, shortest-paths, images, features, etc)
  1. Prepare Navigation Data
  1. Train IL-based room-navigator and object-navigator
  1. Finetune using RL

Contact

This project is maintained by Licheng Yu.

License

BSD