Home

Awesome

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

This is the repository of ORIST (ICCV 2021).

<p align="center"> <img src="orist.png" width="100%"> </p>

Some code in this repo are copied/modified from opensource implementations made available by PyTorch, HuggingFace, OpenNMT, Nvidia, and UNITER The object features are extracted using BUTD, with expanded object bounding boxes of REVERIE.

Features of the Code

Requirements

docker pull qykshr/ubuntu:orist 

Quick Start

  1. Download the processed data and pretrained models:

  2. Build Matterport3D simulator

    Build OSMesa version using CMake:

    mkdir build && cd build
    cmake -DOSMESA_RENDERING=ON ..
    make
    cd ../
    

    Other versions can refer to here

  3. Run inference:

    sh eval_scripts/xxx.sh

  4. Run training:

    sh run_scripts/xxx.sh

Citation

If this code or data is useful for your research, please consider citing:

@inproceedings{orist,
  author    = {Yuankai Qi and
               Zizheng Pan and
               Yicong Hong and
               Ming{-}Hsuan Yang and
               Anton van den Hengel and
               Qi Wu},
  title     = {The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation},
  booktitle   = {ICCV},
  pages     = {1655--1664},
  year      = {2021}
}

@inproceedings{reverie,
  author    = {Yuankai Qi and
               Qi Wu and
               Peter Anderson and
               Xin Wang and
               William Yang Wang and
               Chunhua Shen and
               Anton van den Hengel},
  title     = {{REVERIE:} Remote Embodied Visual Referring Expression in Real Indoor
               Environments},
  booktitle = {CVPR},
  pages     = {9979--9988},
  year      = {2020}
}