Awesome

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

This is the repository of ORIST (ICCV 2021).

Some code in this repo are copied/modified from opensource implementations made available by PyTorch, HuggingFace, OpenNMT, Nvidia, and UNITER The object features are extracted using BUTD, with expanded object bounding boxes of REVERIE.

Features of the Code

Implemented distributed data parallel training (pytorch).
Some code optimization for fast training

Requirements

Install Docker with GPU support (There are lots of tutorials, just google it.)
Pull the docker image:

docker pull qykshr/ubuntu:orist

Quick Start

Download the processed data and pretrained models:
- Processed data:
- For evaluation only:
- For training:
Build Matterport3D simulator

Build OSMesa version using CMake:
```
mkdir build && cd build
cmake -DOSMESA_RENDERING=ON ..
make
cd ../
```
Other versions can refer to here
Run inference:

sh eval_scripts/xxx.sh
Run training:

sh run_scripts/xxx.sh

Citation

If this code or data is useful for your research, please consider citing:

@inproceedings{orist,
  author    = {Yuankai Qi and
               Zizheng Pan and
               Yicong Hong and
               Ming{-}Hsuan Yang and
               Anton van den Hengel and
               Qi Wu},
  title     = {The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation},
  booktitle   = {ICCV},
  pages     = {1655--1664},
  year      = {2021}
}

@inproceedings{reverie,
  author    = {Yuankai Qi and
               Qi Wu and
               Peter Anderson and
               Xin Wang and
               William Yang Wang and
               Chunhua Shen and
               Anton van den Hengel},
  title     = {{REVERIE:} Remote Embodied Visual Referring Expression in Real Indoor
               Environments},
  booktitle = {CVPR},
  pages     = {9979--9988},
  year      = {2020}
}