Awesome

Paper List in the survey paper

The papers that we are surveying are listed in this file. The papers are grouped by the following categories:

Foundation models used in Robotics. For these papers, the authors apply existing vision and language foundation models, such as LLM, VLM, vision FM and text-conditioned image generation models in modules of robotics, such as perception, decision making and planning, and action.
Robotic Foundation Models. For these papers, the authors propose new foundation models used in one specific robotic applications, such as control using imilation learning and reinforcement learning. We also include genera-purpose foundation models, such as GATO, PALM-E in this category.

The taxonomy is shown in this figure,

We list all the papers surveyed in our paper. The dates are based on the first released date on arxiv. This list will be constantly updated.

NOTE: We only include papers with experiments on real physical robotics, in high-fidelity robotic simulation environments, or using real robotics datasets.

Foundation models used in Robotics

Perception

CLIPORT CLIPORT: What and Where Pathways for Robotic Manipulation, 24 Sep 2021, paper link
LM-Nav LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, 10 Jul 2022 paper link
NLMap Open-vocabulary Queryable Scene Representations for Real World Planning, 20 Sep 2022, Paper Link
CLIP-Fields CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory, 11 Oct 2022, paper link
VLMap Visual Language Maps for Robot Navigation, 11 Oct 2022, paper link
ConceptFusion ConceptFusion: Open-set Multimodal 3D Mapping, 14 Feb 2023, Paper Link
WVN Fast Traversability Estimation for Wild Visual Navigation, 15 May 2023, paper link
HomeRobot HomeRobot: Open-Vocabulary Mobile Manipulation, 20 Jun 2023, paper link
Act3D Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation, 30 Jun 2023, paper link
F3RM Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, 27 Jul 2023, paper link
AnyLoc Towards Universal Visual Place Recognition, 1 Aug 2023, paper link
GNFactor GNFactor Multi-Task Real Robot Learning with Generalizable Neural Feature Fields, 31 Aug 2023, paper link
MOSAIC MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Perception, 15 Sep 2023, paper link
SpatialVLM SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities, 22 Jan 2024, paper link
OK-Robot OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics, 22 Jan 2024, paper link
MOKA MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting, 5 Mar 2024, paper link
DNAct DNAct: Diffusion Guided Multi-Task 3D Policy Learning, 7 Mar 2024, paper link
GeFF Learning Generalizable Feature Fields for Mobile Manipulation, 12 Mar 2024, paper link
Octopi Octopi: Object Property Reasoning with Large Tactile-Language Models, 5 Jun 2024, paper link

Task Planning

Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers, 25 Mar 2022, paper link
Socratic Models Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, 1 Apr 2022, paper link
SayCan Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, 4 Apr 2022, paper link
Correcting Robot Plans with Natural Language Feedback, 11 Apr 2022, paper link
Housekeep Housekeep: Tidying Virtual Households using Commonsense Reasoning, 22 May 2022, paper link
Inner Monologue Inner Monologue: Embodied Reasoning through Planning with Language Models, 12 Jul 2022, paper link
Code as Policies Code as Policies: Language Model Programs for Embodied Control, 16 Sep 2022, paper link
ProgPrompt ProgPrompt: Generating Situated Robot Task Plans using Large Language Models, 22 Sep 2022, paper link
VIMA VIMA: General Robot Manipulation with Multimodal Prompts, 6 Oct 2022, paper link
LILAC “No, to the Right” – Online Language Corrections for Robotic Manipulation via Shared Autonomy, 6 Jan 2023, paper link
SceneDiffuser Diffusion-based Generation, Optimization, and Planning in 3D Scenes, 15 Jan 2023, paper link
ChatGPT for Robotics ChatGPT for Robotics: Design Principles and Model Abilities, 20 Feb 2023, paper link
Grounded Decoding Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control, 1 Mar 2023 , paper link
TidyBot TidyBot: Personalized Robot Assistance with Large Language Models, 9 May 2023, paper link
Instruct2Act Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model, 18 May 2023, paper link
KNOWNO Robots That Ask For Help:Uncertainty Alignment for Large Language Model Planners, 4 Jul 2023, paper link
RoCo RoCo: Dialectic Multi-Robot Collaboration with Large Language Models, 10 Jul 2023, paper link
SayPlan SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning, 12 Jul 2023 , paper link
VLP: Video Language Planning, 16 Oct 2023 , paper link
SuSIE SuSIE: Subgoal Synthesis via Image Editing, 2023, paper link
RoboTool RoboTool: Creative Robot Tool Use with Large Language Models, 23 Oct 2023, project link
AutoRT AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents, 05 Jan 2024, paoer link
PIVOT PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs, 12 Feb 2024, paper link
ReKep ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation, 29 Aug, 2024, paper link

Action Generation

SayTap SayTap: Language to Quadrupedal Locomotion, 13 Jun 2023, paper link
L2R Language to Rewards for Robotic Skill Synthesis, 14 Jun 2023 , Paper Link
VoxPoser VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models, 12 Jul 2023, paper link
ReasonedExplorer Reasoning about the Unseen for Efficient Outdoor Object Navigation, 18 Sep 2023, paper link
Eureka Eureka: Human-Level Reward Design via Coding Large Language Models, 19 Oct 2023, paper link
Generative Expressive Robot Behaviors using Large Language Models, 26 Jan 2024, paper link
LMPC Learning to Learn Faster from Human Feedback with Language Model Predictive Control, 18 Feb 2024, paper link
Manipulate-Anything Manipulate-Anything: Automating Real-World Robots using Vision-Language Models, 29 August 2024, paper link

Data Generation

CACTI CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning, 12 Dec 2022, paper link
ROSIE Scaling Robot Learning with Semantically Imagined Experience, 22 Feb 2023 , paper link
GenSim GenSim: Generating Robotic Simulation Tasks via Large Language Models, 2 Oct 2023, paper link
URDFormer URDFormer: Constructing Interactive Realistic Scenes from Real Images via Simulation and Generative Modeling, 20 Oct 2023, paper link
RoboGen RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, 2 Nov 2023, paper link
RT-Trajectory RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches, 3 Nov 2023, paper link

Robotic Foundation Models

Single-Purpose

Action Generation

ZeST Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?, 23 Apr 2022 , paper link
Behavior Transformers Behavior Transformers: Cloning k modes with one stone, 22 Jun 2022, paper link
ATLA Leveraging Language for Accelerated Learning of Tool Manipulation, 27 Jun 2022, paper link
LATTE LATTE: LAnguage Trajectory TransformEr, 4 Aug 2022, paper link
Perceiver-Actor Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, 12 Sep 2022, paper link
MVP Real-World Robot Learning with Masked Visual Pre-training, 6 Oct 2022, paper link
GNM GNM: A General Navigation Model to Drive Any Robot, 7 Oct 2022 , Paper Link
Interactive Language Interactive Language: Talking to Robots in Real Time, 12 Oct 2022 , paper link
Conditional Behavior Transformers (C-BeT)From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data, 18 Oct 2022, paper link
STAP STAP: Sequencing Task-Agnostic Policies, 21 Oct 2022, paper link
LILA LILA: Language-Informed Latent Actions, 31 Oct 2022, paper link
DIAL Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models, 21 Nov 2022 , paper link
RT-1 RT-1: Robotics Transformer for Real-World Control at Scale, Dec 2022, paper link
MOO Open-World Object Manipulation using Pre-Trained Vision-Language Models, 2 Mar 2023, paper link
RC-1 Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?, 31 Mar 2023 , paper link
CoTPC Chain-of-Thought Predictive Control, 3 Apr 2023 , paper link
ARNOLD ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes, 9 Apr 2023, paper link
Optimus Imitating Task and Motion Planning with Visuomotor Transformers, 25 May 2023, paper link
RoboCat RoboCat: A self-improving robotic agent, 20 Jun 2023, paper link
Scaling Up and Distilling Down Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition, 26 Jul 2023, paper link
ViNT ViNT: A Foundation Model for Visual Navigation, 26 Jun 2023, Paper Link
RT-2 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, 28 Jul 2023, paper link
RoboAgent RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking, 5 Sep 2023, paper link
RT-X Open X-Embodiment: Robotic Learning Datasets and RT-X Models, 13 Oct 2023, paper link
Q-Transformer Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, 18 Sept 2023, paper link
On Bringing Robots Home, 27 Nov 2023, paper link
Octo Octo: An Open-Source Generalist Robot Policy, 14 Dec 2023, paper link
VQ-BeT VQ-BeT: Behavior Generation with Latent Actions, 5 Mar 2024, paper link
OpenVLA OpenVLA: An Open-Source Vision-Language-Action Model, 13 June 2024, paper link
LLaRA LLaRA: Supercharging Robot Learning Data for Vision-Language Policy, 28 June 2024, paper link
ICRT In-Context Imitation Learning via Next-Token Prediction, 28 Aug 2024, paper link

General-Purpose

GATO A Generalist Agent, 12 May 2022, paper link
PACT PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training, 22 Sep 2022, paper link
PALM-E PaLM-E: An Embodied Multimodal Language Model, 6 Mar 2023, paper link
LEO An Embodied Generalist Agent in 3D World, 18 Nov 2023, paper link
RoboPoint RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics, 21 Jun 2024, paper link
CrossFormer CrossFormer 🦾 Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation, 21 Aug 2024, paper link
AHA AHA: A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation, 1 Oct 2024, paper link)
HOVER HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots, 28 Oct 2024, paper link
π0 π0: A Vision-Language-Action Flow Model for General Robot Control, 31 Oct 2024, paper link

Related Surveys and repositories

Robotics surveys

Reinforcement Learning in Robotics: A Survey, 2013, paper link
A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms, 2021, paper link
How to train your robot with deep reinforcement learning: lessons we have learned, 2021, paper link

Foundation models surveys

On the Opportunities and Risks of Foundation Models, 2021, paper link
Foundation Models for Decision Making: Problems, Methods, and Opportunities, 2023, paper link
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond, 2023, paper link
Challenges and Applications of Large Language Models, 2023, paper link
A Survey on Large Language Model based Autonomous Agents, 2023, paper link

Foundation models and robotics

Awesome-LLM-Robotics repo link
Foundation Models in Robotics: Applications, Challenges, and the Future, 2023, paper link
Neural Scaling Laws for Embodied AI, 22 May 2024 , paper link

BibTex

If you find our survey paper helpful, please kindly consider citing us:

@article{hu2023robofm,
      author = {Yafei Hu and Quanting Xie and Vidhi Jain and Jonathan Francis and Jay Patrikar and 
                  Nikhil Keetha and Seungchan Kim and Yaqi Xie and Tianyi Zhang and Hao-Shu Fang and Shibo Zhao 
                  and Shayegan Omidshafiei and Dong-Ki Kim and Ali-akbar Agha-mohammadi and Katia Sycara and 
                  Matthew Johnson-Roberson and Dhruv Batra and Xiaolong Wang and Sebastian Scherer and Chen Wang 
                  and Zsolt Kira and Fei Xia and Yonatan Bisk},
      title = {Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis},
      booktitle = {arXiv preprint: arXiv:2312.08782 },
      year = {2023},
}