Awesome
Awesome Vision-Language Navigation
A curated list of research papers in Vision-Language Navigation (VLN). Link to the code and website if available is also present. You can also find more embodied vision papers in awesome-embodied-vision.
Contributing
Please feel free to contact me via email (liudq@mail.ustc.edu.cn) or open an issue or submit a pull request.
To add a new paper via pull request:
-
Fork the repo, edit
README.md
. -
Put the new paper at the correct chronological position as the following format: <br>
- **Paper Title** <br> *Author(s)* <br> Conference, Year. [[Paper]](link) [[Code]](link) [[Website]](link)
-
Send a pull request. Ideally, I will review the request within a week.
Papers
Tasks:
-
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments <br> Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel <br> CVPR, 2018. [Paper] [Code] [Website]
-
HoME: a Household Multimodal Environment <br> Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville <br> NIPS Workshop, 2017. [Paper] [Code]
-
Talk the Walk: Navigating New York City through Grounded Dialogue <br> Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, Douwe Kiela <br> arXiv, 2019. [Paper] [Code]
-
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments <br> Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi <br> CVPR, 2019. [Paper] [Code] [Website]
-
Vision-based Navigation with Language-based Assistance via Imitation Learning with Indirect Intervention <br> Khanh Nguyen, Debadeepta Dey, Chris Brockett, Bill Dolan <br> CVPR, 2019. [Paper] [Code] [Video]
-
Learning To Follow Directions in Street View <br> Karl Moritz Hermann, Mateusz Malinowski, Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Raia Hadsell <br> AAAI, 2020. [Paper] [Website]
-
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments <br> Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton van den Hengel <br> CVPR, 2020. [Paper]
-
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation <br> Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge <br> ACL, 2019. [Paper] [Code]
-
Vision-and-Dialog Navigation <br> Jesse Thomason, Michael Murray, Maya Cakmak, Luke Zettlemoyer <br> CoRL, 2019. [Paper] [Website]
-
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning <br> Khanh Nguyen, Hal Daumé III <br> EMNLP, 2019. [Paper] [Code] [Video]
-
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory <br> Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool <br> arXiv, 2019. [Paper] [Website]
-
Cross-Lingual Vision-Language Navigation <br> An Yan, Xin Wang, Jiangtao Feng, Lei Li, William Yang Wang <br> arXiv, 2019. [Paper] [Code]
-
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments <br> Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee <br> arXiv, 2020. [Paper] [Code]
-
Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation <br> Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira <br> ICRA, 2021. [Paper] [Code]
Roadmap (Chronological Order):
-
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments <br> Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel <br> CVPR, 2018. [Paper] [Code] [Website]
-
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation <br> Xin Wang, Wenhan Xiong, Hongmin Wang, William Yang Wang <br> ECCV, 2018. [Paper]
-
Speaker-Follower Models for Vision-and-Language Navigation <br> Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell <br> NeurIPS, 2018. [Paper] [Code] [Website]
-
Shifting the Baseline: Single Modality Performance on Visual Navigation & QA <br> Jesse Thomason, Daniel Gordon, Yonatan Bisk <br> NAACL, 2019. [Paper] [Poster]
-
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation <br> Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang <br> CVPR, 2019. [Paper]
-
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation <br> Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong <br> ICLR, 2019. [Paper] [Code] [Website]
-
The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation <br> Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira <br> CVPR, 2019. [Paper] [Code] [Website]
-
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation <br> Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa <br> CVPR, 2019. [Paper] [Code] [Video]
-
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout <br> Hao Tan, Licheng Yu, Mohit Bansal <br> NAACL, 2019. [Paper] [Code]
-
Multi-modal Discriminative Model for Vision-and-Language Navigation <br> Haoshuo Huang, Vihan Jain, Harsh Mehta, Jason Baldridge, Eugene Ie <br> NAACL Workshop, 2019. [Paper]
-
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation <br> Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko <br> ACL, 2019. [Paper]
-
Chasing Ghosts: Instruction Following as Bayesian State Tracking <br> Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee <br> NeurIPS, 2019. [Paper] [Code] [Video]
-
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters <br> Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara <br> BMVC, 2019. [Paper] [Code]
-
Transferable Representation Learning in Vision-and-Language Navigation <br> Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie <br> ICCV, 2019. [Paper]
-
Robust Navigation with Language Pretraining and Stochastic Sampling <br> Xiujun Li, Chunyuan Li, Qiaolin Xia, Yonatan Bisk, Asli Celikyilmaz, Jianfeng Gao, Noah Smith, Yejin Choi <br> EMNLP, 2019. [Paper] [Code]
-
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling <br> Tsu-Jui Fu, Xin Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang <br> arXiv, 2019. [Paper]
-
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation <br> Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang <br> CVPR, 2020. [Paper]
-
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks <br> Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang <br> CVPR, 2020. [Paper]
-
Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation <br> Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara <br> arXiv, 2019. [Paper] [Code]
-
Just Ask: An Interactive Learning Framework for Vision and Language Navigation <br> Ta-Chung Chi, Mihail Eric, Seokhwan Kim, Minmin Shen, Dilek Hakkani-tur <br> AAAI, 2020. [Paper]
-
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training <br> Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, Jianfeng Gao <br> CVPR, 2020. [Paper] [Code]
-
Multi-View Learning for Vision-and-Language Navigation <br> Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Jianfeng Gao, Yejin Choi, Noah A. Smith <br> arXiv, 2020. [Paper]
-
Vision-Dialog Navigation by Exploring Cross-modal Memory <br> Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang <br> CVPR, 2020. [Paper] [Code]
-
Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation <br> Felix Yu, Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky <br> arXiv, 2020. [Paper]
-
Sub-Instruction Aware Vision-and-Language Navigation <br> Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould <br> arXiv, 2020. [Paper]
-
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments <br> Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee <br> ECCV, 2020. [Paper] [Code] [Website]
-
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling <br> Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang <br> ECCV, 2020. [Paper]
-
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web <br> Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra <br> ECCV, 2020. [Paper]
-
Soft Expert Reward Learning for Vision-and-Language Navigation <br> Hu Wang, Qi Wu, Chunhua Shen <br> ECCV, 2020. [Paper]
-
Active Visual Information Gathering for Vision-Language Navigation <br> Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen <br> ECCV, 2020. [Paper] [Code]
-
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation <br> Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi <br> ECCV, 2020. [Paper]
-
Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation <br> Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira <br> ICRA, 2021. [Paper] [Code] [Website] [Video]