Awesome
Awesome-LLM-Robotics
This repo contains a curative list of papers using Large Language/Multi-Modal Models for Robotics/RL. Template from awesome-Implicit-NeRF-Robotics <br>
Please feel free to send me pull requests or email to add papers! <br>
If you find this repository useful, please consider citing and STARing this list. Feel free to share this list with others!
Overview
Surveys
- "A Superalignment Framework in Autonomous Driving with Large Language Models", arXiv, Jun 2024, [Paper]
- "Neural Scaling Laws for Embodied AI", arXiv, May 2024. [Paper]
- "Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis", arXiv, Dec 2023. [Paper] [Paper List] [Website]
- "Language-conditioned Learning for Robotic Manipulation: A Survey", arXiv, Dec 2023, [Paper]
- "Foundation Models in Robotics: Applications, Challenges, and the Future", arXiv, Dec 2023, [Paper] [Paper List]
- "Robot Learning in the Era of Foundation Models: A Survey", arXiv, Nov 2023, [Paper]
- "The Development of LLMs for Embodied Navigation", arXiv, Nov 2023, [Paper]
Reasoning
- AHA: "AHA: A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation", arXiv, Oct 1. [Paper] [Website]
- ReKep: "ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation", arXiv, Sep 2024. [Paper] [Code] [Website]
- Octopi: "Octopi: Object Property Reasoning with Large Tactile-Language Models", Robotics: Science and Systems (RSS), June 24. [Paper] [Code] [Website]
- CLEAR: "Language, Camera, Autonomy! Prompt-engineered Robot Control for Rapidly Evolving Deployment", ACM/IEEE International Conference on Human-Robot Interaction (HRI), Mar 2024. [Paper] [Code]
- MoMa-LLM: "Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation", arXiv, Mar 2024. [Paper] [Code] [Website]
- AutoRT: "Embodied Foundation Models for Large Scale Orchestration of Robotic Agents", arXiv, Jan 2024. [Paper] [Website]
- LEO: "An Embodied Generalist Agent in 3D World", arXiv, Nov 2023. [Paper] [Code] [Website]
- LLM-State: "LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model", arXiv, Nov 2023. [Paper]
- Robogen: "A generative and self-guided robotic agent that endlessly propose and master new skills.", arXiv, Nov 2023. [Paper] [Code] [Website]
- SayPlan: "Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning", Conference on Robot Learning (CoRL), Nov 2023. [Paper] [Website]
- [LLaRP] "Large Language Models as Generalizable Policies for Embodied Tasks", arXiv, Oct 2023. [Paper] [Website]
- [RT-X] "Open X-Embodiment: Robotic Learning Datasets and RT-X Models", arXiv, July 2023. [Paper] [Website]
- [RT-2] "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control", arXiv, July 2023. [Paper] [Website]
- Instruct2Act: "Mapping Multi-modality Instructions to Robotic Actions with Large Language Model", arXiv, May 2023. [Paper] [Pytorch Code]
- TidyBot: "Personalized Robot Assistance with Large Language Models", arXiv, May 2023. [Paper] [Pytorch Code] [Website]
- Generative Agents: "Generative Agents: Interactive Simulacra of Human Behavior", arXiv, Apr 2023. [Paper Code]
- Matcha: "Chat with the Environment: Interactive Multimodal Perception using Large Language Models", IROS, Mar 2023. [Paper] [Github] [Website]
- PaLM-E: "PaLM-E: An Embodied Multimodal Language Model", arXiv, Mar 2023, [Paper] [Webpage]
- "Large Language Models as Zero-Shot Human Models for Human-Robot Interaction", arXiv, Mar 2023. [Paper]
- CortexBench "Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?" arXiv, Mar 2023. [Paper]
- "Translating Natural Language to Planning Goals with Large-Language Models", arXiv, Feb 2023. [Paper]
- RT-1: "RT-1: Robotics Transformer for Real-World Control at Scale", arXiv, Dec 2022. [Paper] [GitHub] [Website]
- "PDDL Planning with Pretrained Large Language Models", NeurIPS, Oct 2022. [Paper] [Github]
- ProgPrompt: "Generating Situated Robot Task Plans using Large Language Models", arXiv, Sept 2022. [Paper] [Github] [Website]
- Code-As-Policies: "Code as Policies: Language Model Programs for Embodied Control", arXiv, Sept 2022. [Paper] [Colab] [Website]
- PIGLeT: "PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World", ACL, Jun 2021. [Paper] [Pytorch Code] [Website]
- Say-Can: "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances", arXiv, Apr 2021. [Paper] [Colab] [Website]
- Socratic: "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language", arXiv, Apr 2021. [Paper] [Pytorch Code] [Website]
Planning
- LABOR Agent: "Large Language Models for Orchestrating Bimanual Robots", Humanoids, Nov. 2024. [Paper] [Website], [Code]
- SELP: "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", arXiv, Sept 2024. [Paper]
- Wonderful Team: "Solving Robotics Problems in Zero-Shot with Vision-Language Models", arXiv, Jul 2024. [Paper] [Code] [Website]
- Embodied AI in Mobile Robots: Coverage Path Planning with Large Language Models", arXiV, Jul 2024, [Paper]
- FLTRNN: "FLTRNN: Faithful Long-Horizon Task Planning for Robotics with Large Language Models", ICRA, May 17th 2024, [Paper] [Code] [Website]
- LLM-Personalize: "LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots", arXiv, Apr 2024. [Paper] [Website] [Code]
- LLM3: "LLM3: Large Language Model-based Task and Motion Planning with Motion Failure Reasoning", IROS, Mar 2024. [Paper][Code]
- BTGenBot: "BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs", arXiv, March 2024. [Paper][Github]
- Attentive Support: "To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions", arXiv, March 2024. [Paper] [Website][Code]
- Beyond Text: "Beyond Text: Improving LLM's Decision Making for Robot Navigation via Vocal Cues", arxiv, Feb 2024. [Paper]
- SayCanPay: "SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge", AAAI Jan 2024, [Paper] [Code] [Website]
- ViLa: "Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning", arXiv, Sep 2023, [Paper] [Website]
- CoPAL: "Corrective Planning of Robot Actions with Large Language Models", ICRA, Oct 2023. [Paper] [Website][Code]
- LGMCTS: "LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement", arXiv, Sep 2023. [Paper]
- Prompt2Walk: "Prompt a Robot to Walk with Large Language Models", arXiv, Sep 2023, [Paper] [Website]
- DoReMi: "Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment", arXiv, July 2023, [Paper] [Website]
- Co-LLM-Agents: "Building Cooperative Embodied Agents Modularly with Large Language Models", arXiv, Jul 2023. [Paper] [Code] [Website]
- LLM-Reward: "Language to Rewards for Robotic Skill Synthesis", arXiv, Jun 2023. [Paper] [Website]
- LLM-BRAIn: "LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model", arXiv, May 2023. [Paper]
- GLAM: "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning", arXiv, May 2023. [Paper] [Pytorch Code]
- LLM-MCTS: "Large Language Models as Commonsense Knowledge for Large-Scale Task Planning", arXiv, May 2023. [Paper]
- AlphaBlock: "AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation", arxiv, May 2023. [Paper]
- LLM+P:"LLM+P: Empowering Large Language Models with Optimal Planning Proficiency", arXiv, Apr 2023, [Paper] [Code]
- ChatGPT-Prompts: "ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application", arXiv, Apr 2023, [Paper] [Code/Prompts]
- ReAct: "ReAct: Synergizing Reasoning and Acting in Language Models", ICLR, Apr 2023. [Paper] [Github] [Website]
- LLM-Brain: "LLM as A Robotic Brain: Unifying Egocentric Memory and Control", arXiv, Apr 2023. [Paper]
- "Foundation Models for Decision Making: Problems, Methods, and Opportunities", arXiv, Mar 2023, [Paper]
- LLM-planner: "LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models", arXiv, Mar 2023. [Paper] [Pytorch Code] [Website]
- Text2Motion: "Text2Motion: From Natural Language Instructions to Feasible Plans", arXiV, Mar 2023, [Paper] [Website]
- GD: "Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control", arXiv, Mar 2023. [Paper] [Website]
- PromptCraft: "ChatGPT for Robotics: Design Principles and Model Abilities", Blog, Feb 2023, [Paper] [Website]
- "Reward Design with Language Models", ICML, Feb 2023. [Paper] [Pytorch Code]
- "Planning with Large Language Models via Corrective Re-prompting", arXiv, Nov 2022. [Paper]
- Don't Copy the Teacher: "Don’t Copy the Teacher: Data and Model Challenges in Embodied Dialogue", EMNLP, Oct 2022. [Paper] [Website]
- COWP: "Robot Task Planning and Situation Handling in Open Worlds", arXiv, Oct 2022. [Paper] [Pytorch Code] [Website]
- LM-Nav: "Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action", arXiv, July 2022. [Paper] [Pytorch Code] [Website]
- InnerMonlogue: "Inner Monologue: Embodied Reasoning through Planning with Language Models", arXiv, July 2022. [Paper] [Website]
- Housekeep: "Housekeep: Tidying Virtual Households using Commonsense Reasoning", arXiv, May 2022. [Paper] [Pytorch Code] [Website]
- FILM: "FILM: Following Instructions in Language with Modular Methods", ICLR, Apr 2022. [Paper] [Code] [Website]
- MOO: "Open-World Object Manipulation using Pre-Trained Vision-Language Models", arXiv, Mar 2022. [Paper] [Website]
- LID: "Pre-Trained Language Models for Interactive Decision-Making", arXiv, Feb 2022. [Paper] [Pytorch Code] [Website]
- "Collaborating with language models for embodied reasoning", NeurIPS, Feb 2022. [Paper]
- ZSP: "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents", ICML, Jan 2022. [Paper] [Pytorch Code] [Website]
- CALM: "Keep CALM and Explore: Language Models for Action Generation in Text-based Games", arXiv, Oct 2020. [Paper] [Pytorch Code]
- "Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions", arXiV, Oct 2020, [Paper]
Manipulation
- A3VLM: "A3VLM: Actionable Articulation-Aware Vision Language Model", CoRL, Nov 2024. [Paper] [PyTorch Code]
- Manipulate-Anything: "Manipulate-Anything: Automating Real-World Robots using Vision-Language Models", CoRL, Nov 2024. [Paper] [Website]
- RobiButler: "RobiButler: Remote Multimodal Interactions with Household Robot Assistant", arXiv, Sept 2024. [Paper] [Website]
- SKT: "SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation", arXiv, Sept 2024. [Paper] [Website]
- UniAff: "UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models", arXiv, Sept 2024. [Paper] [Website]
- Plan-Seq-Learn:"Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks", ICLR, May 2024. [Paper], [PyTorch Code] [Website]
- ExploRLLM:"ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models", arXiv, Mar 2024. [Paper] [Website]
- ManipVQA:"ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models", IROS, Mar 2024, [Paper] [PyTorch Code]
- BOSS: "Bootstrap Your Own Skills: Learning to Solve New Tasks with LLM Guidance", CoRL, Nov 2023. [Paper] [Website]
- Lafite-RL: "Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models", CoRL Workshop, Nov 2023. [Paper]
- Octopus:"Octopus: Embodied Vision-Language Programmer from Environmental Feedback", arXiv, Oct 2023, [Paper] [PyTorch Code] [Website]
- [Text2Reward] "Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning", arXiv, Sep 2023, [Paper] [Website]
- PhysObjects: "Physically Grounded Vision-Language Models for Robotic Manipulation", arxiv, Sept 2023. [Paper]
- [VoxPoser] "VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models", arXiv, July 2023, [Paper] [Website]
- Scalingup: "Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition", arXiv, July 2023. [Paper] [Code] [Website]
- VoxPoser:"VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models", arXiv, Jul 2023. [Paper] [Website]
- LIV:"LIV: Language-Image Representations and Rewards for Robotic Control", arXiv, Jun 2023, [Paper] [Pytorch Code] [Website]
- "Language Instructed Reinforcement Learning for Human-AI Coordination", arXiv, Jun 2023. [Paper]
- RoboCat: "RoboCat: A self-improving robotic agent", arxiv, Jun 2023. [Paper] [Website]
- SPRINT: "SPRINT: Semantic Policy Pre-training via Language Instruction Relabeling", arxiv, June 2023. [Paper] [Website]
- Grasp Anything: "Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots", arxiv, June 2023. [Paper]
- LLM-GROP:"Task and Motion Planning with Large Language Models for Object Rearrangement", arXiv, May 2023. [Paper] [Website]
- VOYAGER:"VOYAGER: An Open-Ended Embodied Agent with Large Language Models", arXiv, May 2023. [Paper] [Pytorch Code] [Website]
- TIP: "Multimodal Procedural Planning via Dual Text-Image Prompting", arXiV, May 2023, [Paper]
- ProgramPort:"Programmatically Grounded, Compositionally Generalizable Robotic Manipulation", ICLR, Apr 2023, [Paper] [[Website] (https://progport.github.io/)]
- VLaMP: "Pretrained Language Models as Visual Planners for Human Assistance", arXiV, Apr 2023, [Paper]
- "Towards a Unified Agent with Foundation Models", ICLR, Apr 2023. [Paper]
- CoTPC:"Chain-of-Thought Predictive Control", arXiv, Apr 2023, [Paper] [Code]
- Plan4MC:"Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks", arXiv, Mar 2023. [Paper] [Pytorch Code] [Website]
- ELLM:"Guiding Pretraining in Reinforcement Learning with Large Language Models", arXiv, Feb 2023. [Paper]
- DEPS:"Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents", arXiv, Feb 2023. [Paper] [Pytorch Code]
- LILAC:"No, to the Right – Online Language Corrections for Robotic Manipulation via Shared Autonomy", arXiv, Jan 2023, [Paper] [Pytorch Code]
- DIAL:"Robotic Skill Acquistion via Instruction Augmentation with Vision-Language Models", arXiv, Nov 2022, [Paper] [Website]
- Gato: "A Generalist Agent", TMLR, Nov 2022. [Paper] [Website]
- NLMap:"Open-vocabulary Queryable Scene Representations for Real World Planning", arXiv, Sep 2022, [Paper] [Website]
- R3M:"R3M: A Universal Visual Representation for Robot Manipulation", arXiv, Nov 2022, [Paper] [Pytorch Code] [Website]
- CLIP-Fields:"CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory", arXiv, Oct 2022, [Paper] [PyTorch Code] [Website]
- VIMA:"VIMA: General Robot Manipulation with Multimodal Prompts", arXiv, Oct 2022, [Paper] [Pytorch Code] [Website]
- Perceiver-Actor:"A Multi-Task Transformer for Robotic Manipulation", CoRL, Sep 2022. [Paper] [Pytorch Code] [Website]
- LaTTe: "LaTTe: Language Trajectory TransformEr", arXiv, Aug 2022. [Paper] [TensorFlow Code] [Website]
- Robots Enact Malignant Stereotypes: "Robots Enact Malignant Stereotypes", FAccT, Jun 2022. [Paper] [Pytorch Code] [Website] [Washington Post] [Wired] (code access on request)
- ATLA: "Leveraging Language for Accelerated Learning of Tool Manipulation", CoRL, Jun 2022. [Paper]
- ZeST: "Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?", L4DC, Apr 2022. [Paper]
- LSE-NGU: "Semantic Exploration from Language Abstractions and Pretrained Representations", arXiv, Apr 2022. [Paper]
- MetaMorph: "METAMORPH: LEARNING UNIVERSAL CONTROLLERS WITH TRANSFORMERS", arxiv, Mar 2022. [Paper]
- Embodied-CLIP: "Simple but Effective: CLIP Embeddings for Embodied AI", CVPR, Nov 2021. [Paper] [Pytorch Code]
- CLIPort: "CLIPort: What and Where Pathways for Robotic Manipulation", CoRL, Sept 2021. [Paper] [Pytorch Code] [Website]
Instructions and Navigation
- GSON: "GSON: A Group-based Social Navigation Framework with Large Multimodal Model", arxiv, Sept 2024 [Paper]
- Navid: "NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation", arxiv, Mar 2024 [Paper] [Website]
- OVSG: "Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs", CoRL, Nov 2023. [Paper] [Code] [Website]
- VLMaps: "Visual Language Maps for Robot Navigation", arXiv, Mar 2023. [Paper] [Pytorch Code] [Website]
- "Interactive Language: Talking to Robots in Real Time", arXiv, Oct 2022 [Paper] [Website]
- NLMap:"Open-vocabulary Queryable Scene Representations for Real World Planning", arXiv, Sep 2022, [Paper] [Website]
- ADAPT: "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts", CVPR, May 2022. [Paper]
- "The Unsurprising Effectiveness of Pre-Trained Vision Models for Control", ICML, Mar 2022. [Paper] [Pytorch Code] [Website]
- CoW: "CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration", arXiv, Mar 2022. [Paper]
- Recurrent VLN-BERT: "A Recurrent Vision-and-Language BERT for Navigation", CVPR, Jun 2021 [Paper] [Pytorch Code]
- VLN-BERT: "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web", ECCV, Apr 2020 [Paper] [Pytorch Code]
Simulation Frameworks
- ManiSkill3: "ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI.", arxiv, Oct 2024. [Paper] [Code] [Website]
- GENESIS: "A generative world for general-purpose robotics & embodied AI learning.", arXiv, Nov 2023. [Code]
- ARNOLD: "ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes", ICCV, Apr 2023. [Paper] [Code] [Website]
- OmniGibson: "OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine".6th Annual Conference on Robot Learning, 2022. [Paper] [Code]
- MineDojo: "MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge", arXiv, Jun 2022. [Paper] [Code] [Website] [Open Database]
- Habitat 2.0: "Habitat 2.0: Training Home Assistants to Rearrange their Habitat", NeurIPS, Dec 2021. [Paper] [Code] [Website]
- BEHAVIOR: "BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments", CoRL, Nov 2021. [Paper] [Code] [Website]
- iGibson 1.0: "iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes", IROS, Sep 2021. [Paper] [Code] [Website]
- ALFRED: "ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks", CVPR, Jun 2020. [Paper] [Code] [Website]
- BabyAI: "BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning", ICLR, May 2019. [[https://arxiv.org/abs/1810.08272)] [Code]
Safety, Risks, Red Teaming, and Adversarial Testing
- LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions: arXiv, Jun 2024. [Paper]
- Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics: arXiv, Feb 2024. [Paper]
- Robots Enact Malignant Stereotypes: FAccT, Jun 2022. [arXiv] [DOI] [Code] [Website]
Citation
If you find this repository useful, please consider citing this list:
@misc{kira2022llmroboticspaperslist,
title = {Awesome-LLM-Robotics},
author = {Zsolt Kira},
journal = {GitHub repository},
url = {https://github.com/GT-RIPL/Awesome-LLM-Robotics},
year = {2022},
}