Awesome
Awesome-Papers-Autonomous-Agent
This is a collection of recent papers focusing on autonomous agent. Here is how Wikipedia defines Agent:
In artificial intelligence, an intelligent agent is an agent acting in an intelligent manner; It perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge. An intelligent agent may be simple or complex: A thermostator other control systemis considered an example of an intelligent agent, as is a human being, as is any system that meets the definition, such as a firm, a state, or a biome.
Thus, the key of an agent is that it can <i>achieve goals</i>, <i>acquire knowledge</i> and <i>continually improve</i>. The traditional agents in RL research will not be considered in this collection. Though LLM-based agents have caught people's eyes in recent research, RL-based agents also take their special position. Specifically, this repo is interested in two types of agent: RL-based agent and LLM-based agent.
Note that this paper list is under active maintaince. Free free to open an issue if you found any missed papers that fit the topic.
Update history
- 2024/01/31: Add a special list for surveys on autonomous agent.
- 2023/12/08: Add papers accepted by ICML'23 and ICLR'23 :rocket:
- 2023/11/08: Add papers accepted by NeurIPS'23. Add related links (project page or github) to these accepted papers :tada:
- 2023/10/25: Classify all papers based on their research topics. Check ToC for the standard of classification :clap:
- 2023/10/18: Release first version of collection, including papers submitted to ICLR 2024 :rocket:
Table of Contents
<!-- END doctoc generated TOC please keep comment here to allow auto update -->Surveys
- A Survey on Large Language Model based Autonomous Agents
- The Rise and Potential of Large Language Model Based Agents: A Survey
RL-based agent
Instruction following
- [NeurIPS'23] Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation
- [NeurIPS'23] Guide Your Agent with Adaptive Multimodal Rewards [project]
- Compositional Instruction Following with Language Models and Reinforcement Learning
- RT-1: Robotics Transformer for Real-World Control at Scale [blog]
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [blog]
- Open X-Embodiment: Robotic Learning Datasets and RT-X Models [blog]
- [NeurIPS'23] Guide Your Agent with Adaptive Multimodal Rewards [project]
- LEO: An Embodied Generalist Agent in 3D World [project]
Build agent based on World model
- [ICLR'23 Oral] Transformers are Sample-Efficient World Models [code]
- Learning to Model the World with Language
- MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning
Language as knowledge
- Learning with Language Inference and Tips for Continual Reinforcement Learning
- Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes
- Language Reward Modulation for Pretraining Reinforcement Learning
LLM as a tool
- [NeurIPS'23] Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
- [ICLR'23] Reward Design with Language Models [code]
- [ICML'23] RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents [Poster]
- [ICML'23] Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling [Project][Code]
- [ICML'23] Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
- Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning
- Text2Reward: Dense Reward Generation with Language Models for Reinforcement Learning
- Language to Rewards for Robotic Skill Synthesis
- Eureka: Human-Level Reward Design via Coding Large Language Models
- STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models
Generalization across tasks
Continual learning
- ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning
- Online Continual Learning for Interactive Instruction Following Agents
- [NeurIPS'23] A Definition of Continual Reinforcement Learning
Combine RL and LLM
- [NeurIPS'23] Large Language Models Are Semi-Parametric Reinforcement Learning Agents
- RoboGPT : An intelligent agent of making embodied long-term decisions for daily instruction tasks
- Can Language Agents Approach the Performance of RL? An Empirical Study On OpenAI Gym
- RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds
Transformer-based policy
- [NeurIPS'23] Cross-Episodic Curriculum for Transformer Agents. [project]
Trajectory to language
- [NeurIPS'23] State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
- [NeurIPS'23] Semantic HELM: A Human-Readable Memory for Reinforcement Learning
- [ICML'23] Distilling Internet-Scale Vision-Language Models into Embodied Agents
- Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation
Trajectory predication
Others
- Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human Gain
- A Competition Winning Deep Reinforcement Learning Agent in microRTS
- Aligning Agents like Large Language Models
LLM-based agent
Multimodal
- [ICML'23] PaLM-E: An Embodied Multimodal Language Model
- Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
- Multimodal Web Navigation with Instruction-Finetuned Foundation Models
- You Only Look at Screens: Multimodal Chain-of-Action Agents
- Learning Embodied Vision-Language Programming From Instruction, Exploration, and Environmental Feedback
- An Embodied Generalist Agent in 3D World
- JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models
Train LLM for generalization & adaptation
- FireAct: Toward Language Agent Finetuning
- Adapting LLM Agents Through Communication
- AgentTuning: Enabling Generalized Agent Abilities for LLMs
- Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Task-specific designing
- [NeurIPS'23] Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents
- [NeurIPS'23] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks [Github]
- Rethinking the Buyer’s Inspection Paradox in Information Markets with Language Agents
- A Language-Agent Approach to Formal Theorem-Proving
- Agent Instructs Large Language Models to be General Zero-Shot Reasoners
- Ghost in the Minecraft: Hierarchical Agents for Minecraft via Large Language Models with Text-based Knowledge and Memory
- PaperQA: Retrieval-Augmented Generative Agent for Scientific Research
- Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale
- Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
Multi-agent (e.g., society, coperation)
- CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving
- Building Cooperative Embodied Agents Modularly with Large Language Models
- OKR-Agent: An Object and Key Results Driven Agent System with Hierarchical Self-Collaboration and Self-Evaluation
- MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
- AutoAgents: A Framework for Automatic Agent Generation
- Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
- Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View
- REX: Rapid Exploration and eXploitation for AI agents
- Emergence of Social Norms in Large Language Model-based Agent Societies
Experimental analysis
- Identifying the Risks of LM Agents with an LM-Emulated Sandbox
- Evaluating Multi-Agent Coordination Abilities in Large Language Models
- Large Language Models as Gaming Agents
- Benchmarking Large Language Models as AI Research Agents
- Adaptive Environmental Modeling for Task-Oriented Language Agents
- CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Benchmark & Dataset
- [ACL'24] A Controllable World of Apps and People for Benchmarking Interactive Coding Agents [website][blog]
- [ICLR'23] Task Ambiguity in Humans and Language Models [code]
- SmartPlay : A Benchmark for LLMs as Intelligent Agents
- AgentBench: Evaluating LLMs as Agents
- Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
- SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
- SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series
- WebArena: A Realistic Web Environment for Building Autonomous Agents
- LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Game
- Evaluating Large Language Models at Evaluating Instruction Following
- CivRealm: A Learning and Reasoning Odyssey for Decision-Making Agents
Applications
- Lyfe Agents: generative agents for low-cost real-time social interactions
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Algorithm design
- [ICLR'23 Oral] ReAct: Synergizing Reasoning and Acting in Language Models [code]
- [NeurIPS'23] AdaPlanner: Adaptive Planning from Feedback with Language Models [github]
- Prospector: Improving LLM Agents with Self-Asking and Trajectory Ranking
- Formally Specifying the High-Level Behavior of LLM-Based Agents
- Cumulative Reasoning With Large Language Models
Combined with RL
- [NeurIPS'23] Reflexion: language agents with verbal reinforcement learning [code]
- Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning
- Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game