Awesome

Awesome-LLM-Robotics

This repo contains a curative list of papers using Large Language/Multi-Modal Models for Robotics/RL. Template from awesome-Implicit-NeRF-Robotics <br>

Please feel free to send me pull requests or email to add papers! <br>

If you find this repository useful, please consider citing and STARing this list. Feel free to share this list with others!

Surveys

"A Superalignment Framework in Autonomous Driving with Large Language Models", arXiv, Jun 2024, [Paper]
"Neural Scaling Laws for Embodied AI", arXiv, May 2024. [Paper]
"Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis", arXiv, Dec 2023. [Paper] [Paper List] [Website]
"Language-conditioned Learning for Robotic Manipulation: A Survey", arXiv, Dec 2023, [Paper]
"Foundation Models in Robotics: Applications, Challenges, and the Future", arXiv, Dec 2023, [Paper] [Paper List]
"Robot Learning in the Era of Foundation Models: A Survey", arXiv, Nov 2023, [Paper]
"The Development of LLMs for Embodied Navigation", arXiv, Nov 2023, [Paper]

Reasoning

AHA: "AHA: A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation", arXiv, Oct 1. [Paper] [Website]
ReKep: "ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation", arXiv, Sep 2024. [Paper] [Code] [Website]
Octopi: "Octopi: Object Property Reasoning with Large Tactile-Language Models", Robotics: Science and Systems (RSS), June 24. [Paper] [Code] [Website]
CLEAR: "Language, Camera, Autonomy! Prompt-engineered Robot Control for Rapidly Evolving Deployment", ACM/IEEE International Conference on Human-Robot Interaction (HRI), Mar 2024. [Paper] [Code]
MoMa-LLM: "Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation", arXiv, Mar 2024. [Paper] [Code] [Website]
AutoRT: "Embodied Foundation Models for Large Scale Orchestration of Robotic Agents", arXiv, Jan 2024. [Paper] [Website]
LEO: "An Embodied Generalist Agent in 3D World", arXiv, Nov 2023. [Paper] [Code] [Website]
LLM-State: "LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model", arXiv, Nov 2023. [Paper]
Robogen: "A generative and self-guided robotic agent that endlessly propose and master new skills.", arXiv, Nov 2023. [Paper] [Code] [Website]
SayPlan: "Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning", Conference on Robot Learning (CoRL), Nov 2023. [Paper] [Website]
[LLaRP] "Large Language Models as Generalizable Policies for Embodied Tasks", arXiv, Oct 2023. [Paper] [Website]
[RT-X] "Open X-Embodiment: Robotic Learning Datasets and RT-X Models", arXiv, July 2023. [Paper] [Website]
[RT-2] "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control", arXiv, July 2023. [Paper] [Website]
Instruct2Act: "Mapping Multi-modality Instructions to Robotic Actions with Large Language Model", arXiv, May 2023. [Paper] [Pytorch Code]
TidyBot: "Personalized Robot Assistance with Large Language Models", arXiv, May 2023. [Paper] [Pytorch Code] [Website]
Generative Agents: "Generative Agents: Interactive Simulacra of Human Behavior", arXiv, Apr 2023. [Paper Code]
Matcha: "Chat with the Environment: Interactive Multimodal Perception using Large Language Models", IROS, Mar 2023. [Paper] [Github] [Website]
PaLM-E: "PaLM-E: An Embodied Multimodal Language Model", arXiv, Mar 2023, [Paper] [Webpage]
"Large Language Models as Zero-Shot Human Models for Human-Robot Interaction", arXiv, Mar 2023. [Paper]
CortexBench "Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?" arXiv, Mar 2023. [Paper]
"Translating Natural Language to Planning Goals with Large-Language Models", arXiv, Feb 2023. [Paper]
RT-1: "RT-1: Robotics Transformer for Real-World Control at Scale", arXiv, Dec 2022. [Paper] [GitHub] [Website]
"PDDL Planning with Pretrained Large Language Models", NeurIPS, Oct 2022. [Paper] [Github]
ProgPrompt: "Generating Situated Robot Task Plans using Large Language Models", arXiv, Sept 2022. [Paper] [Github] [Website]
Code-As-Policies: "Code as Policies: Language Model Programs for Embodied Control", arXiv, Sept 2022. [Paper] [Colab] [Website]
PIGLeT: "PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World", ACL, Jun 2021. [Paper] [Pytorch Code] [Website]
Say-Can: "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances", arXiv, Apr 2021. [Paper] [Colab] [Website]
Socratic: "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language", arXiv, Apr 2021. [Paper] [Pytorch Code] [Website]

Planning

LABOR Agent: "Large Language Models for Orchestrating Bimanual Robots", Humanoids, Nov. 2024. [Paper] [Website], [Code]
SELP: "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", arXiv, Sept 2024. [Paper]
Wonderful Team: "Solving Robotics Problems in Zero-Shot with Vision-Language Models", arXiv, Jul 2024. [Paper] [Code] [Website]
Embodied AI in Mobile Robots: Coverage Path Planning with Large Language Models", arXiV, Jul 2024, [Paper]
FLTRNN: "FLTRNN: Faithful Long-Horizon Task Planning for Robotics with Large Language Models", ICRA, May 17th 2024, [Paper] [Code] [Website]
LLM-Personalize: "LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots", arXiv, Apr 2024. [Paper] [Website] [Code]
LLM3: "LLM3: Large Language Model-based Task and Motion Planning with Motion Failure Reasoning", IROS, Mar 2024. [Paper][Code]
BTGenBot: "BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs", arXiv, March 2024. [Paper][Github]
Attentive Support: "To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions", arXiv, March 2024. [Paper] [Website][Code]
Beyond Text: "Beyond Text: Improving LLM's Decision Making for Robot Navigation via Vocal Cues", arxiv, Feb 2024. [Paper]
SayCanPay: "SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge", AAAI Jan 2024, [Paper] [Code] [Website]
ViLa: "Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning", arXiv, Sep 2023, [Paper] [Website]
CoPAL: "Corrective Planning of Robot Actions with Large Language Models", ICRA, Oct 2023. [Paper] [Website][Code]
LGMCTS: "LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement", arXiv, Sep 2023. [Paper]
Prompt2Walk: "Prompt a Robot to Walk with Large Language Models", arXiv, Sep 2023, [Paper] [Website]
DoReMi: "Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment", arXiv, July 2023, [Paper] [Website]
Co-LLM-Agents: "Building Cooperative Embodied Agents Modularly with Large Language Models", arXiv, Jul 2023. [Paper] [Code] [Website]
LLM-Reward: "Language to Rewards for Robotic Skill Synthesis", arXiv, Jun 2023. [Paper] [Website]
LLM-BRAIn: "LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model", arXiv, May 2023. [Paper]
GLAM: "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning", arXiv, May 2023. [Paper] [Pytorch Code]
LLM-MCTS: "Large Language Models as Commonsense Knowledge for Large-Scale Task Planning", arXiv, May 2023. [Paper]
AlphaBlock: "AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation", arxiv, May 2023. [Paper]
LLM+P:"LLM+P: Empowering Large Language Models with Optimal Planning Proficiency", arXiv, Apr 2023, [Paper] [Code]
ChatGPT-Prompts: "ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application", arXiv, Apr 2023, [Paper] [Code/Prompts]
ReAct: "ReAct: Synergizing Reasoning and Acting in Language Models", ICLR, Apr 2023. [Paper] [Github] [Website]
LLM-Brain: "LLM as A Robotic Brain: Unifying Egocentric Memory and Control", arXiv, Apr 2023. [Paper]
"Foundation Models for Decision Making: Problems, Methods, and Opportunities", arXiv, Mar 2023, [Paper]
LLM-planner: "LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models", arXiv, Mar 2023. [Paper] [Pytorch Code] [Website]
Text2Motion: "Text2Motion: From Natural Language Instructions to Feasible Plans", arXiV, Mar 2023, [Paper] [Website]
GD: "Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control", arXiv, Mar 2023. [Paper] [Website]
PromptCraft: "ChatGPT for Robotics: Design Principles and Model Abilities", Blog, Feb 2023, [Paper] [Website]
"Reward Design with Language Models", ICML, Feb 2023. [Paper] [Pytorch Code]
"Planning with Large Language Models via Corrective Re-prompting", arXiv, Nov 2022. [Paper]
Don't Copy the Teacher: "Don’t Copy the Teacher: Data and Model Challenges in Embodied Dialogue", EMNLP, Oct 2022. [Paper] [Website]
COWP: "Robot Task Planning and Situation Handling in Open Worlds", arXiv, Oct 2022. [Paper] [Pytorch Code] [Website]
LM-Nav: "Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action", arXiv, July 2022. [Paper] [Pytorch Code] [Website]
InnerMonlogue: "Inner Monologue: Embodied Reasoning through Planning with Language Models", arXiv, July 2022. [Paper] [Website]
Housekeep: "Housekeep: Tidying Virtual Households using Commonsense Reasoning", arXiv, May 2022. [Paper] [Pytorch Code] [Website]
FILM: "FILM: Following Instructions in Language with Modular Methods", ICLR, Apr 2022. [Paper] [Code] [Website]
MOO: "Open-World Object Manipulation using Pre-Trained Vision-Language Models", arXiv, Mar 2022. [Paper] [Website]
LID: "Pre-Trained Language Models for Interactive Decision-Making", arXiv, Feb 2022. [Paper] [Pytorch Code] [Website]
"Collaborating with language models for embodied reasoning", NeurIPS, Feb 2022. [Paper]
ZSP: "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents", ICML, Jan 2022. [Paper] [Pytorch Code] [Website]
CALM: "Keep CALM and Explore: Language Models for Action Generation in Text-based Games", arXiv, Oct 2020. [Paper] [Pytorch Code]
"Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions", arXiV, Oct 2020, [Paper]

Manipulation

A3VLM: "A3VLM: Actionable Articulation-Aware Vision Language Model", CoRL, Nov 2024. [Paper] [PyTorch Code]
Manipulate-Anything: "Manipulate-Anything: Automating Real-World Robots using Vision-Language Models", CoRL, Nov 2024. [Paper] [Website]
RobiButler: "RobiButler: Remote Multimodal Interactions with Household Robot Assistant", arXiv, Sept 2024. [Paper] [Website]
SKT: "SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation", arXiv, Sept 2024. [Paper] [Website]
UniAff: "UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models", arXiv, Sept 2024. [Paper] [Website]
Plan-Seq-Learn:"Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks", ICLR, May 2024. [Paper], [PyTorch Code] [Website]
ExploRLLM:"ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models", arXiv, Mar 2024. [Paper] [Website]
ManipVQA:"ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models", IROS, Mar 2024, [Paper] [PyTorch Code]
BOSS: "Bootstrap Your Own Skills: Learning to Solve New Tasks with LLM Guidance", CoRL, Nov 2023. [Paper] [Website]
Lafite-RL: "Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models", CoRL Workshop, Nov 2023. [Paper]
Octopus:"Octopus: Embodied Vision-Language Programmer from Environmental Feedback", arXiv, Oct 2023, [Paper] [PyTorch Code] [Website]
[Text2Reward] "Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning", arXiv, Sep 2023, [Paper] [Website]
PhysObjects: "Physically Grounded Vision-Language Models for Robotic Manipulation", arxiv, Sept 2023. [Paper]
[VoxPoser] "VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models", arXiv, July 2023, [Paper] [Website]
Scalingup: "Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition", arXiv, July 2023. [Paper] [Code] [Website]
VoxPoser:"VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models", arXiv, Jul 2023. [Paper] [Website]
LIV:"LIV: Language-Image Representations and Rewards for Robotic Control", arXiv, Jun 2023, [Paper] [Pytorch Code] [Website]
"Language Instructed Reinforcement Learning for Human-AI Coordination", arXiv, Jun 2023. [Paper]
RoboCat: "RoboCat: A self-improving robotic agent", arxiv, Jun 2023. [Paper] [Website]
SPRINT: "SPRINT: Semantic Policy Pre-training via Language Instruction Relabeling", arxiv, June 2023. [Paper] [Website]
Grasp Anything: "Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots", arxiv, June 2023. [Paper]
LLM-GROP:"Task and Motion Planning with Large Language Models for Object Rearrangement", arXiv, May 2023. [Paper] [Website]
VOYAGER:"VOYAGER: An Open-Ended Embodied Agent with Large Language Models", arXiv, May 2023. [Paper] [Pytorch Code] [Website]
TIP: "Multimodal Procedural Planning via Dual Text-Image Prompting", arXiV, May 2023, [Paper]
ProgramPort:"Programmatically Grounded, Compositionally Generalizable Robotic Manipulation", ICLR, Apr 2023, [Paper] [[Website] (https://progport.github.io/)]
VLaMP: "Pretrained Language Models as Visual Planners for Human Assistance", arXiV, Apr 2023, [Paper]
"Towards a Unified Agent with Foundation Models", ICLR, Apr 2023. [Paper]
CoTPC:"Chain-of-Thought Predictive Control", arXiv, Apr 2023, [Paper] [Code]
Plan4MC:"Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks", arXiv, Mar 2023. [Paper] [Pytorch Code] [Website]
ELLM:"Guiding Pretraining in Reinforcement Learning with Large Language Models", arXiv, Feb 2023. [Paper]
DEPS:"Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents", arXiv, Feb 2023. [Paper] [Pytorch Code]
LILAC:"No, to the Right – Online Language Corrections for Robotic Manipulation via Shared Autonomy", arXiv, Jan 2023, [Paper] [Pytorch Code]
DIAL:"Robotic Skill Acquistion via Instruction Augmentation with Vision-Language Models", arXiv, Nov 2022, [Paper] [Website]
Gato: "A Generalist Agent", TMLR, Nov 2022. [Paper] [Website]
NLMap:"Open-vocabulary Queryable Scene Representations for Real World Planning", arXiv, Sep 2022, [Paper] [Website]
R3M:"R3M: A Universal Visual Representation for Robot Manipulation", arXiv, Nov 2022, [Paper] [Pytorch Code] [Website]
CLIP-Fields:"CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory", arXiv, Oct 2022, [Paper] [PyTorch Code] [Website]
VIMA:"VIMA: General Robot Manipulation with Multimodal Prompts", arXiv, Oct 2022, [Paper] [Pytorch Code] [Website]
Perceiver-Actor:"A Multi-Task Transformer for Robotic Manipulation", CoRL, Sep 2022. [Paper] [Pytorch Code] [Website]
LaTTe: "LaTTe: Language Trajectory TransformEr", arXiv, Aug 2022. [Paper] [TensorFlow Code] [Website]
Robots Enact Malignant Stereotypes: "Robots Enact Malignant Stereotypes", FAccT, Jun 2022. [Paper] [Pytorch Code] [Website] [Washington Post] [Wired] (code access on request)
ATLA: "Leveraging Language for Accelerated Learning of Tool Manipulation", CoRL, Jun 2022. [Paper]
ZeST: "Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?", L4DC, Apr 2022. [Paper]
LSE-NGU: "Semantic Exploration from Language Abstractions and Pretrained Representations", arXiv, Apr 2022. [Paper]
MetaMorph: "METAMORPH: LEARNING UNIVERSAL CONTROLLERS WITH TRANSFORMERS", arxiv, Mar 2022. [Paper]
Embodied-CLIP: "Simple but Effective: CLIP Embeddings for Embodied AI", CVPR, Nov 2021. [Paper] [Pytorch Code]
CLIPort: "CLIPort: What and Where Pathways for Robotic Manipulation", CoRL, Sept 2021. [Paper] [Pytorch Code] [Website]

Instructions and Navigation

GSON: "GSON: A Group-based Social Navigation Framework with Large Multimodal Model", arxiv, Sept 2024 [Paper]
Navid: "NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation", arxiv, Mar 2024 [Paper] [Website]
OVSG: "Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs", CoRL, Nov 2023. [Paper] [Code] [Website]
VLMaps: "Visual Language Maps for Robot Navigation", arXiv, Mar 2023. [Paper] [Pytorch Code] [Website]
"Interactive Language: Talking to Robots in Real Time", arXiv, Oct 2022 [Paper] [Website]
NLMap:"Open-vocabulary Queryable Scene Representations for Real World Planning", arXiv, Sep 2022, [Paper] [Website]
ADAPT: "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts", CVPR, May 2022. [Paper]
"The Unsurprising Effectiveness of Pre-Trained Vision Models for Control", ICML, Mar 2022. [Paper] [Pytorch Code] [Website]
CoW: "CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration", arXiv, Mar 2022. [Paper]
Recurrent VLN-BERT: "A Recurrent Vision-and-Language BERT for Navigation", CVPR, Jun 2021 [Paper] [Pytorch Code]
VLN-BERT: "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web", ECCV, Apr 2020 [Paper] [Pytorch Code]

Simulation Frameworks

ManiSkill3: "ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI.", arxiv, Oct 2024. [Paper] [Code] [Website]
GENESIS: "A generative world for general-purpose robotics & embodied AI learning.", arXiv, Nov 2023. [Code]
ARNOLD: "ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes", ICCV, Apr 2023. [Paper] [Code] [Website]
OmniGibson: "OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine".6th Annual Conference on Robot Learning, 2022. [Paper] [Code]
MineDojo: "MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge", arXiv, Jun 2022. [Paper] [Code] [Website] [Open Database]
Habitat 2.0: "Habitat 2.0: Training Home Assistants to Rearrange their Habitat", NeurIPS, Dec 2021. [Paper] [Code] [Website]
BEHAVIOR: "BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments", CoRL, Nov 2021. [Paper] [Code] [Website]
iGibson 1.0: "iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes", IROS, Sep 2021. [Paper] [Code] [Website]
ALFRED: "ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks", CVPR, Jun 2020. [Paper] [Code] [Website]
BabyAI: "BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning", ICLR, May 2019. [[https://arxiv.org/abs/1810.08272)] [Code]

Safety, Risks, Red Teaming, and Adversarial Testing

LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions: arXiv, Jun 2024. [Paper]
Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics: arXiv, Feb 2024. [Paper]
Robots Enact Malignant Stereotypes: FAccT, Jun 2022. [arXiv] [DOI] [Code] [Website]

Citation

If you find this repository useful, please consider citing this list:

@misc{kira2022llmroboticspaperslist,
    title = {Awesome-LLM-Robotics},
    author = {Zsolt Kira},
    journal = {GitHub repository},
    url = {https://github.com/GT-RIPL/Awesome-LLM-Robotics},
    year = {2022},
}