Home

Awesome

Awesome Vision-and-Language Navigation

<!-- Vision-and-Language Navigation (VLN) has becoming an important topic. This repo keeps tracking the recent advancements in VLN. --> <!-- Create an issue or email to jgu110@ucsc.edu if you have any suggestions on this repo! -->

This repo keeps track of the recent advances in Vision-and-Language Navigation research. Please check out our ACL 2022 VLN survey paper for the catogerization approach and the detailed discussions of tasks, methods, and future directions: Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions.

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.

Datasets and Benchmarks

<!-- - Visual Navigation for Mobile Robots: A Survey [paper](https://link.springer.com/article/10.1007/s10846-008-9235-4) -->

Initial Instruction

Guidance

Dialog

Evaluation

Here we introduce papers that includes new evaluation metrics.

Methods

Representation Learning

Pretraining

Semantic Understanding

Graph Representation

Memory-augmented Model

Auxiliary Tasks

Action Strategy Learning

Reinforcement Learning

Exploration during Navigation

Navigation Planning

Asking for Help

Data-centric Learning

Data Augmentation

<!-- ##### Trajectory-Instruction Augmentation --> <!-- ##### Environment Augmentation -->

Curriculum Learning

Multitask Learning

Instruction Interpretation

Prior Exploration

Related Areas

Using 2D MAPS environments

Using synthetic environments

Visual Navigation

If you find this repo useful for your research, please cite

@InProceedings{jing2022vln,
      title={Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions}, 
      author={Jing Gu and Eliana Stefani and Qi Wu and Jesse Thomason and Xin Eric Wang},
      booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)},
      year = {2022}
}