Awesome
<p align="center"> <img width="1000" src="./assets/teaser.png"/> </p>Paper List in the survey paper
The papers that we are surveying are listed in this file. The papers are grouped by the following categories:
-
Foundation models used in Robotics. For these papers, the authors apply existing vision and language foundation models, such as LLM, VLM, vision FM and text-conditioned image generation models in modules of robotics, such as perception, decision making and planning, and action.
-
Robotic Foundation Models. For these papers, the authors propose new foundation models used in one specific robotic applications, such as control using imilation learning and reinforcement learning. We also include genera-purpose foundation models, such as GATO, PALM-E in this category.
The taxonomy is shown in this figure,
<p align="center"> <img width="1000" src="./assets/taxonomy.png"/> </p>We list all the papers surveyed in our paper. The dates are based on the first released date on arxiv. This list will be constantly updated.
NOTE: We only include papers with experiments on real physical robotics, in high-fidelity robotic simulation environments, or using real robotics datasets.
Foundation models used in Robotics
Perception
- CLIPORT CLIPORT: What and Where Pathways for Robotic Manipulation, 24 Sep 2021, paper link
- LM-Nav LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, 10 Jul 2022 paper link
- NLMap Open-vocabulary Queryable Scene Representations for Real World Planning, 20 Sep 2022, Paper Link
- CLIP-Fields CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory, 11 Oct 2022, paper link
- VLMap Visual Language Maps for Robot Navigation, 11 Oct 2022, paper link
- ConceptFusion ConceptFusion: Open-set Multimodal 3D Mapping, 14 Feb 2023, Paper Link
- WVN Fast Traversability Estimation for Wild Visual Navigation, 15 May 2023, paper link
- HomeRobot HomeRobot: Open-Vocabulary Mobile Manipulation, 20 Jun 2023, paper link
- Act3D Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation, 30 Jun 2023, paper link
- F3RM Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, 27 Jul 2023, paper link
- AnyLoc Towards Universal Visual Place Recognition, 1 Aug 2023, paper link
- GNFactor GNFactor Multi-Task Real Robot Learning with Generalizable Neural Feature Fields, 31 Aug 2023, paper link
- MOSAIC MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Perception, 15 Sep 2023, paper link
- SpatialVLM SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities, 22 Jan 2024, paper link
- OK-Robot OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics, 22 Jan 2024, paper link
- MOKA MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting, 5 Mar 2024, paper link
- DNAct DNAct: Diffusion Guided Multi-Task 3D Policy Learning, 7 Mar 2024, paper link
- GeFF Learning Generalizable Feature Fields for Mobile Manipulation, 12 Mar 2024, paper link
- Octopi Octopi: Object Property Reasoning with Large Tactile-Language Models, 5 Jun 2024, paper link
Task Planning
<!-- - **Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents**, 2022 [paper link](https://arxiv.org/pdf/2201.07207.pdf) -->- Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers, 25 Mar 2022, paper link
- Socratic Models Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, 1 Apr 2022, paper link
- SayCan Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, 4 Apr 2022, paper link
- Correcting Robot Plans with Natural Language Feedback, 11 Apr 2022, paper link
- Housekeep Housekeep: Tidying Virtual Households using Commonsense Reasoning, 22 May 2022, paper link
- Inner Monologue Inner Monologue: Embodied Reasoning through Planning with Language Models, 12 Jul 2022, paper link
- Code as Policies Code as Policies: Language Model Programs for Embodied Control, 16 Sep 2022, paper link
- ProgPrompt ProgPrompt: Generating Situated Robot Task Plans using Large Language Models, 22 Sep 2022, paper link
- VIMA VIMA: General Robot Manipulation with Multimodal Prompts, 6 Oct 2022, paper link
- LILAC “No, to the Right” – Online Language Corrections for Robotic Manipulation via Shared Autonomy, 6 Jan 2023, paper link <!-- - **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents**, 2023, [paper link](https://arxiv.org/pdf/2302.01560.pdf) -->
- SceneDiffuser Diffusion-based Generation, Optimization, and Planning in 3D Scenes, 15 Jan 2023, paper link
- ChatGPT for Robotics ChatGPT for Robotics: Design Principles and Model Abilities, 20 Feb 2023, paper link
- Grounded Decoding Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control, 1 Mar 2023 , paper link
- TidyBot TidyBot: Personalized Robot Assistance with Large Language Models, 9 May 2023, paper link
- Instruct2Act Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model, 18 May 2023, paper link
- KNOWNO Robots That Ask For Help:Uncertainty Alignment for Large Language Model Planners, 4 Jul 2023, paper link
- RoCo RoCo: Dialectic Multi-Robot Collaboration with Large Language Models, 10 Jul 2023, paper link
- SayPlan SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning, 12 Jul 2023 , paper link
- VLP: Video Language Planning, 16 Oct 2023 , paper link
- SuSIE SuSIE: Subgoal Synthesis via Image Editing, 2023, paper link
- RoboTool RoboTool: Creative Robot Tool Use with Large Language Models, 23 Oct 2023, project link
- AutoRT AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents, 05 Jan 2024, paoer link
- PIVOT PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs, 12 Feb 2024, paper link
- ReKep ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation, 29 Aug, 2024, paper link
Action Generation
- SayTap SayTap: Language to Quadrupedal Locomotion, 13 Jun 2023, paper link
- L2R Language to Rewards for Robotic Skill Synthesis, 14 Jun 2023 , Paper Link
- VoxPoser VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models, 12 Jul 2023, paper link
- ReasonedExplorer Reasoning about the Unseen for Efficient Outdoor Object Navigation, 18 Sep 2023, paper link
- Eureka Eureka: Human-Level Reward Design via Coding Large Language Models, 19 Oct 2023, paper link
- Generative Expressive Robot Behaviors using Large Language Models, 26 Jan 2024, paper link
- LMPC Learning to Learn Faster from Human Feedback with Language Model Predictive Control, 18 Feb 2024, paper link
- Manipulate-Anything Manipulate-Anything: Automating Real-World Robots using Vision-Language Models, 29 August 2024, paper link
Data Generation
- CACTI CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning, 12 Dec 2022, paper link
- ROSIE Scaling Robot Learning with Semantically Imagined Experience, 22 Feb 2023 , paper link
- GenSim GenSim: Generating Robotic Simulation Tasks via Large Language Models, 2 Oct 2023, paper link
- URDFormer URDFormer: Constructing Interactive Realistic Scenes from Real Images via Simulation and Generative Modeling, 20 Oct 2023, paper link
- RoboGen RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, 2 Nov 2023, paper link
- RT-Trajectory RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches, 3 Nov 2023, paper link
Robotic Foundation Models
Single-Purpose
Action Generation
<!-- - **Pre-Trained Language Models for Interactive Decision-Making**, 2022, [paper link](https://arxiv.org/pdf/2202.01771.pdf) -->- ZeST Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?, 23 Apr 2022 , paper link
- Behavior Transformers Behavior Transformers: Cloning k modes with one stone, 22 Jun 2022, paper link
- ATLA Leveraging Language for Accelerated Learning of Tool Manipulation, 27 Jun 2022, paper link
- LATTE LATTE: LAnguage Trajectory TransformEr, 4 Aug 2022, paper link
- Perceiver-Actor Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, 12 Sep 2022, paper link
- MVP Real-World Robot Learning with Masked Visual Pre-training, 6 Oct 2022, paper link
- GNM GNM: A General Navigation Model to Drive Any Robot, 7 Oct 2022 , Paper Link
- Interactive Language Interactive Language: Talking to Robots in Real Time, 12 Oct 2022 , paper link
- Conditional Behavior Transformers (C-BeT)From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data, 18 Oct 2022, paper link
- STAP STAP: Sequencing Task-Agnostic Policies, 21 Oct 2022, paper link
- LILA LILA: Language-Informed Latent Actions, 31 Oct 2022, paper link
- DIAL Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models, 21 Nov 2022 , paper link
- RT-1 RT-1: Robotics Transformer for Real-World Control at Scale, Dec 2022, paper link
- MOO Open-World Object Manipulation using Pre-Trained Vision-Language Models, 2 Mar 2023, paper link
- RC-1 Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?, 31 Mar 2023 , paper link
- CoTPC Chain-of-Thought Predictive Control, 3 Apr 2023 , paper link
- ARNOLD ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes, 9 Apr 2023, paper link
- Optimus Imitating Task and Motion Planning with Visuomotor Transformers, 25 May 2023, paper link
- RoboCat RoboCat: A self-improving robotic agent, 20 Jun 2023, paper link
- Scaling Up and Distilling Down Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition, 26 Jul 2023, paper link
- ViNT ViNT: A Foundation Model for Visual Navigation, 26 Jun 2023, Paper Link
- RT-2 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, 28 Jul 2023, paper link
- RoboAgent RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking, 5 Sep 2023, paper link
- RT-X Open X-Embodiment: Robotic Learning Datasets and RT-X Models, 13 Oct 2023, paper link
- Q-Transformer Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, 18 Sept 2023, paper link
- On Bringing Robots Home, 27 Nov 2023, paper link
- Octo Octo: An Open-Source Generalist Robot Policy, 14 Dec 2023, paper link
- VQ-BeT VQ-BeT: Behavior Generation with Latent Actions, 5 Mar 2024, paper link
- OpenVLA OpenVLA: An Open-Source Vision-Language-Action Model, 13 June 2024, paper link
- LLaRA LLaRA: Supercharging Robot Learning Data for Vision-Language Policy, 28 June 2024, paper link
- ICRT In-Context Imitation Learning via Next-Token Prediction, 28 Aug 2024, paper link
General-Purpose
- GATO A Generalist Agent, 12 May 2022, paper link
- PACT PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training, 22 Sep 2022, paper link
- PALM-E PaLM-E: An Embodied Multimodal Language Model, 6 Mar 2023, paper link
- LEO An Embodied Generalist Agent in 3D World, 18 Nov 2023, paper link
- RoboPoint RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics, 21 Jun 2024, paper link
- CrossFormer CrossFormer 🦾 Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation, 21 Aug 2024, paper link
- AHA AHA: A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation, 1 Oct 2024, paper link)
- HOVER HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots, 28 Oct 2024, paper link
- π0 π0: A Vision-Language-Action Flow Model for General Robot Control, 31 Oct 2024, paper link
Related Surveys and repositories
Robotics surveys
- Reinforcement Learning in Robotics: A Survey, 2013, paper link
- A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms, 2021, paper link
- How to train your robot with deep reinforcement learning: lessons we have learned, 2021, paper link
Foundation models surveys
- On the Opportunities and Risks of Foundation Models, 2021, paper link
- Foundation Models for Decision Making: Problems, Methods, and Opportunities, 2023, paper link
- Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond, 2023, paper link
- Challenges and Applications of Large Language Models, 2023, paper link
- A Survey on Large Language Model based Autonomous Agents, 2023, paper link
Foundation models and robotics
- Awesome-LLM-Robotics repo link
- Foundation Models in Robotics: Applications, Challenges, and the Future, 2023, paper link
- Neural Scaling Laws for Embodied AI, 22 May 2024 , paper link
BibTex
If you find our survey paper helpful, please kindly consider citing us:
@article{hu2023robofm,
author = {Yafei Hu and Quanting Xie and Vidhi Jain and Jonathan Francis and Jay Patrikar and
Nikhil Keetha and Seungchan Kim and Yaqi Xie and Tianyi Zhang and Hao-Shu Fang and Shibo Zhao
and Shayegan Omidshafiei and Dong-Ki Kim and Ali-akbar Agha-mohammadi and Katia Sycara and
Matthew Johnson-Roberson and Dhruv Batra and Xiaolong Wang and Sebastian Scherer and Chen Wang
and Zsolt Kira and Fei Xia and Yonatan Bisk},
title = {Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis},
booktitle = {arXiv preprint: arXiv:2312.08782 },
year = {2023},
}