Home

Awesome

A Survey on LLM-based Autonomous Agents

Growth Trend

Autonomous agents are designed to achieve specific objectives through self-guided instructions. With the emergence and growth of large language models (LLMs), there is a growing trend in utilizing LLMs as fundamental controllers for these autonomous agents. While previous studies in this field have achieved remarkable successes, they remain independent proposals with little effort devoted to a systematic analysis. To bridge this gap, we conduct a comprehensive survey study, focusing on the construction, application, and evaluation of LLM-based autonomous agents. In particular, we first explore the essential components of an AI agent, including a profile module, a memory module, a planning module, and an action module. We further investigate the application of LLM-based autonomous agents in the domains of natural sciences, social sciences, and engineering. Subsequently, we delve into a discussion of the evaluation strategies employed in this field, encompassing both subjective and objective methods. Our survey aims to serve as a resource for researchers and practitioners, providing insights, related references, and continuous updates on this exciting and rapidly evolving field.

๐Ÿ“ This is the first released and published survey paper in the field of LLM-based autonomous agents.

Paper link: A Survey on Large Language Model based Autonomous Agents

Update Records

<!--omit in the toc-->

Table of Content

<!-- - [Growth Trend in the Field of LLM-based Autonomous Agent](#-growth-trend-of-llm-based-autonomous-agent)-->

๐Ÿค– Construction of LLM-based Autonomous Agent

Architecture Design

<table> <tr> <td rowspan='2'align='center'>Model</td> <td rowspan='2'align='center'>Profile</td> <td colspan='2'align='center'>Memory</td> <td rowspan='2'align='center'>Planning</td> <td rowspan='2'align='center'>Action</td> <td rowspan='2'align='center'>CA</td> <td rowspan='2'align='center'>Paper</td> <td rowspan='2'align='center'>Code</td> </tr> <tr> <td align='center'>Operation</td> <td align='center'>Structure</td> </tr> <tr> <td align='center'>WebGPT</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2112.09332">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>SayCan</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/o feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2204.01691">Paper</a></td> <td align='center'><a href="https://say-can.github.io/">Code</a></td> </tr> <tr> <td align='center'>MRKL</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/o feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2205.00445">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Inner Monologue</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2207.05608">Paper</a></td> <td align='center'><a href="https://innermonologue.github.io/">Code</a></td> </tr> <tr> <td align='center'>Social Simulacra</td> <td align='center'>GPT-Generated</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/o tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2208.04024">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>ReAct</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2210.03629">Paper</a></td> <td align='center'><a href="https://github.com/ysymyth/ReAct">Code</a></td> </tr> <tr> <td align='center'>LLM Planner</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>Environment feedback</td> <td align='center'><a href="https://arxiv.org/abs/2212.04088">Paper</a></td> <td align='center'><a href="https://dki-lab.github.io/LLM-Planner">Code</a></td> </tr> <tr> <td align='center'>MALLM</td> <td align='center'>-</td> <td align='center'>Read/Write</td> <td align='center'>Hybrid</td> <td align='center'>-</td> <td align='center'>w/o tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2301.04589">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>aiflows</td> <td align='center'>-</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2308.01285">Paper</a></td> <td align='center'><a href="https://github.com/epfl-dlab/aiflows">Code</a></td> </tr> <tr> <td align='center'>DEPS</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2302.01560">Paper</a></td> <td align='center'><a href="https://github.com/CraftJarvis/MC-Planner">Code</a></td> </tr> <tr> <td align='center'>Toolformer</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/o feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2302.04761">Paper</a></td> <td align='center'><a href="https://github.com/lucidrains/toolformer-pytorch">Code</a></td> </tr> <tr> <td align='center'>Reflexion</td> <td align='center'>-</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2303.11366">Paper</a></td> <td align='center'><a href="https://github.com/noahshinn024/reflexion">Code</a></td> </tr> <tr> <td align='center'>CAMEL</td> <td align='center'>Handcrafting & GPT-Generated</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2303.17760">Paper</a></td> <td align='center'><a href="https://github.com/camel-ai/camel">Code</a></td> </tr> <tr> <td align='center'>API-Bank</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2304.08244">Paper</a></td> <td align='center'><a href="url">-</a></td> </tr> </tr> <tr> <td align='center'>Chameleon</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/o feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.09842">Paper</a></td> <td align='center'><a href="https://chameleon-llm.github.io/">Code</a></td> </tr> <tr> <td align='center'>ViperGPT</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2303.08128">Paper</a></td> <td align='center'><a href="https://github.com/cvlab-columbia/viper">Code</a></td> </tr> <tr> <td align='center'>HuggingGPT</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Unified</td> <td align='center'>w/o feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2303.17580">Paper</a></td> <td align='center'><a href="https://huggingface.co/">Code</a></td> </tr> <tr> <td align='center'>Generative Agents</td> <td align='center'>Handcrafting</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.03442">Paper</a></td> <td align='center'><a href="https://github.com/joonspk-research/generative_agents">Code</a></td> </tr> <tr> <td align='center'>LLM+P</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/o feedback</td> <td align='center'>w/o tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.11477">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>ChemCrow</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.05376">Paper</a></td> <td align='center'><a href="https://github.com/ur-whitelab/chemcrow-public">Code</a></td> </tr> <tr> <td align='center'>OpenAGI</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2304.04370">Paper</a></td> <td align='center'><a href="https://github.com/agiresearch/OpenAGI/blob/main/README.md">Code</a></td> </tr> <tr> <td align='center'>AutoGPT</td> <td align='center'>-</td> <td align='center'>Read/Write</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'>-</td> <td align='center'><a href="https://github.com/Significant-Gravitas/Auto-GPT">Code</a></td> </tr> <tr> <td align='center'>SCM</td> <td align='center'>-</td> <td align='center'>Read/Write</td> <td align='center'>Hybrid</td> <td align='center'>-</td> <td align='center'>w/o tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.13343">Paper</a></td> <td align='center'><a href="https://github.com/wbbeyourself/scm4llms">Code</a></td> </tr> <tr> <td align='center'>Socially Alignment</td> <td align='center'>-</td> <td align='center'>Read/Write</td> <td align='center'>Hybrid</td> <td align='center'>-</td> <td align='center'>w/o tools</td> <td align='center'>Example</td> <td align='center'><a href="https://arxiv.org/abs/2305.16960">Paper</a></td> <td align='center'><a href="https://github.com/agi-templar/Stable-Alignment">Code</a></td> </tr> <tr> <td align='center'>GITM</td> <td align='center'>-</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2305.17144">Paper</a></td> <td align='center'><a href="https://github.com/OpenGVLab/GITM">Code</a></td> </tr> <tr> <td align='center'>Voyager</td> <td align='center'>-</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2305.16291">Paper</a></td> <td align='center'><a href="https://github.com/MineDojo/Voyager">Code</a></td> </tr> <tr> <td align='center'>Introspective Tips</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2305.11598">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>RET-LLM</td> <td align='center'>-</td> <td align='center'>Read/Write</td> <td align='center'>Hybrid</td> <td align='center'>-</td> <td align='center'>w/o tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2305.14322">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>ChatDB</td> <td align='center'>-</td> <td align='center'>Read/Write</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2306.03901">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>S3</td> <td align='center'>Dataset alignment</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>-</td> <td align='center'>w/o tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2307.14984">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>ChatDev</td> <td align='center'>Handcrafting</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2307.07924">Paper</a></td> <td align='center'><a href="https://github.com/OpenBMB/ChatDev">Code</a></td> </tr> <tr> <td align='center'>ToolLLM</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2307.16789">Paper</a></td> <td align='center'><a href="https://github.com/OpenBMB/ToolBench">Code</a></td> </tr> <tr> <td align='center'>MemoryBank</td> <td align='center'>-</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>-</td> <td align='center'>w/o tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2305.10250">Paper</a></td> <td align='center'><a href="https://github.com/zhongwanjun/MemoryBank-SiliconFriend">Code</a></td> </tr> <tr> <td align='center'>MetaGPT</td> <td align='center'>Handcrafting</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2308.00352">Paper</a></td> <td align='center'><a href="https://github.com/geekan/MetaGPT">Code</a></td> </tr> <tr> <td align='center'>L2MAC</td> <td align='center'>Handcrafting</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2310.02003">Paper</a></td> <td align='center'><a href="https://github.com/samholt/l2mac">Code</a></td> </tr> <tr> <td align='center'>LEO</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/o tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2311.12871">Paper</a></td> <td align='center'><a href="https://embodied-generalist.github.io">Code</a></td> </tr> <tr> <td align='center'>JARVIS-1</td> <td align='center'>-</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/o fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2311.05997">Paper</a></td> <td align='center'><a href="https://github.com/CraftJarvis/JARVIS-1">Code</a></td> </tr> <tr> <td align='center'>CLOVA</td> <td align='center'>-</td> <td align='center'>Read/Write/<br>Reflection</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2312.10908">Paper</a></td> <td align='center'><a href="https://clova-tool.github.io/">Code</a></td> </tr> <tr> <td align='center'>LearnAct</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>w/ fine-tuning</td> <td align='center'><a href="https://arxiv.org/abs/2402.15809">Paper</a></td> <td align='center'><a href="https://github.com/zhao-ht/LearnAct">Code</a></td> </tr> <tr> <td align='center'>AgentSquare</td> <td align='center'>-</td> <td align='center'>Read/Write</td> <td align='center'>Hybrid</td> <td align='center'>w/ feedback</td> <td align='center'>w/ tools</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2410.06153">Paper</a></td> <td align='center'><a href="https://github.com/tsinghua-fib-lab/agentsquare">Code</a></td> </tr> </table>

๐Ÿ“ Applications of LLM-based Autonomous Agent

<table> <tr> <td align='center'>Title</td> <td align='center'>Social Science </td> <td align='center'>Natural Science </td> <td align='center'>Engineering</td> <td align='center'>Paper</td> <td align='center'>Code</td> </tr> <tr> <td align='center'>Drori et al.</td> <td align='center'>-</td> <td align='center'>Science Education</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2112.15594">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>SayCan</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/pdf/2204.01691">Paper</a></td> <td align='center'><a href="https://say-can.github.io/">Code</a></td> </tr> <tr> <td align='center'>Inner monologue</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/pdf/2207.05608">Paper</a></td> <td align='center'><a href="https://innermonologue.github.io/">Code</a></td> </tr> <tr> <td align='center'>Language-Planners</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2201.07207">Paper</a></td> <td align='center'><a href="https://github.com/huangwl18/language-planner">Code</a></td> </tr> <tr> <td align='center'>Social Simulacra</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2208.04024">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>TE</td> <td align='center'>Psychology </td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2208.10264">Paper</a></td> <td align='center'><a href="https://github.com/gatiaher/using-large-language-models-to-replicate-human-subject-studies">Code</a></td> </tr> <tr> <td align='center'>Out of One</td> <td align='center'>Political Science and Economy</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/pdf/2209.06899">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>LIBRO</td> <td align='center'>CS&SE</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/pdf/2209.11515">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Blind Judgement</td> <td align='center'>Jurisprudence</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/pdf/2301.05327.pdf">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Horton</td> <td align='center'>Political Science and Economy</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2301.07543">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>DECKARD</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2301.12050">Paper</a></td> <td align='center'><a href="https://github.com/DeckardAgent/deckard">Code</a></td> </tr> <tr> <td align='center'>Planner-Actor-Reporter</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2302.00763">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>DEPS</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2302.01560">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>RCI</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2303.17491">Paper</a></td> <td align='center'><a href="https://github.com/posgnu/rci-agent">Code</a></td> </tr> <tr> <td align='center'>Generative Agents</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.03442">Paper</a></td> <td align='center'><a href="https://github.com/joonspk-research/generative_agents">Code</a></td> </tr> <tr> <td align='center'>SCG</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/pdf/2304.07590">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>IGLU</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Civil Engineering</td> <td align='center'><a href="https://arxiv.org/abs/2304.10750">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>IELLM</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Industrial Automation</td> <td align='center'><a href="https://arxiv.org/abs/2304.14354">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>ChemCrow</td> <td align='center'>-</td> <td align='center'>Document and Data Management;<br>Documentation, Data Managent;<br>Science Education<br></td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.05332">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Boiko et al.</td> <td align='center'>-</td> <td align='center'>Document and Data Management;<br>Documentation, Data Managent;<br>Science Education<br></td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.14354">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>GPT4IA</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Industrial Automation</td> <td align='center'><a href="https://arxiv.org/abs/2304.14721">Paper</a></td> <td align='center'><a href="https://github.com/YuchenXia/GPT4IndustrialAutomation">Code</a></td> </tr> <tr> <td align='center'>Self-collaboration</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2304.07590">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>E2WM</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2305.10626">Paper</a></td> <td align='center'><a href=" szxiangjn/world-model-for-language-model (github.com)">Code</a></td> </tr> <tr> <td align='center'>Akata et al.</td> <td align='center'>Psychology </td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2305.16867">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Ziems et al.</td> <td align='center'>Psychology;<br>Political Science and Economy;<br>Research Assistant </td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2305.03514">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>AgentVerse</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2308.10848">Paper</a></td> <td align='center'><a href="https://githuba.com/OpenBMB/AgentVerse">Code</a></td> </tr> <tr> <td align='center'>SmolModels</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'>-</td> <td align='center'><a href="https://github.com/smol-ai/developer">Code</a></td> </tr> <tr> <td align='center'>TidyBot</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2305.05658">Paper</a></td> <td align='center'><a href="https://github.com/jimmyyhwu/tidybot">Code</a></td> </tr> <tr> <td align='center'>PET</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2305.02412">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Voyager</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2305.16291">Paper</a></td> <td align='center'><a href="https://github.com/MineDojo/Voyager">Code</a></td> </tr> <tr> <td align='center'>GITM</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2305.17144">Paper</a></td> <td align='center'><a href="https://github.com/OpenGVLab/GITM">Code</a></td> </tr> <tr> <td align='center'>NLSOM</td> <td align='center'>-</td> <td align='center'>Science Education</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2305.17066">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>LLM4RL</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2306.03604">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>GPT Engineer</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'>-</td> <td align='center'><a href="https://github.com/AntonOsika/gpt-engineer">Code</a></td> </tr> <tr> <td align='center'>Grossman et al.</td> <td align='center'>-</td> <td align='center'>Experiment Assistant;<br>Science Education</td> <td align='center'>-</td> <td align='center'><a href="https://www.science.org/doi/full/10.1126/science.adi1778">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>SQL-PALM</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2306.00739">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>REMEMBER</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2306.07929">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>DemoGPT</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'>-</td> <td align='center'><a href="https://github.com/melih-unsal/DemoGPT">Code</a></td> </tr> <tr> <td align='center'>Chatlaw</td> <td align='center'>Jurisprudence</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2306.16092">Paper</a></td> <td align='center'><a href="https://github.com/PKU-YuanGroup/ChatLaw">Code</a></td> </tr> <tr> <td align='center'>RestGPT</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2306.06624">Paper</a></td> <td align='center'><a href="https://restgpt.github.io">Code</a></td> </tr> <tr> <td align='center'>Dialogue shaping</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2307.15833">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>TaPA</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2307.01848">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Ma et al.</td> <td align='center'>Psychology </td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2307.15810">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Math Agents</td> <td align='center'>-</td> <td align='center'>Science Education</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/pdf/2307.02502">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>SocialAI School</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2307.07871">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Unified Agent</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2307.09668">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Wiliams et al.</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2307.04986">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Li et al.</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2307.10337">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>S3</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2307.14984">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Dialogue Shaping</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/pdf/2307.15833.pdf">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>RoCo</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2307.06135">Paper</a></td> <td align='center'><a href="https://project-roco.github.io/">Code</a></td> </tr> <tr> <td align='center'>Sayplan</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2307.04738">Paper</a></td> <td align='center'><a href="https://sayplan.github.io/">Code</a></td> </tr> <tr> <td align='center'>aiflows</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS & SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.01285">Paper</a></td> <td align='center'><a href="https://github.com/epfl-dlab/aiflows">Code</a></td> </tr> <tr> <td align='center'>ToolLLM</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2307.16789">Paper</a></td> <td align='center'><a href="https://github.com/OpenBMB/ToolBench">Code</a></td> </tr> <tr> <td align='center'>ChatDEV</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2307.07924">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Chao et al.</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2305.13304v1">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>AgentSims</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2308.04026">Paper</a></td> <td align='center'><a href="https://github.com/py499372727/AgentSims">Code</a></td> </tr> <tr> <td align='center'>ChatMOF</td> <td align='center'>-</td> <td align='center'>Document and Data Management;<br>Science Education</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/pdf/2308.01423">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>MetaGPT</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.00352">Paper</a></td> <td align='center'><a href="https://github.com/geekan/MetaGPT">Code</a></td> </tr> <tr> <td align='center'>L2MAC</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2310.02003">Paper</a></td> <td align='center'><a href="https://github.com/samholt/l2mac">Code</a></td> </tr> <tr> <td align='center'>Codehelp</td> <td align='center'>-</td> <td align='center'>Science Education</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.06921">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>AutoGen</td> <td align='center'>-</td> <td align='center'>Science Education</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2308.08155">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>RAH</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.09904">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>DB-GPT</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.05481">Paper</a></td> <td align='center'><a href="https://github.com/TsinghuaDatabaseGroup/DB-GPT">Code</a></td> </tr> <tr> <td align='center'>RecMind</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.14296">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>ChatEDA</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.10204">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>InteRecAgent</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.16505">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>PentestGPT</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.06782">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Codehelp</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2308.06921">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>ProAgent</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2308.11339">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>MindAgent</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2309.09971">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>LEO</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2311.12871">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>JARVIS-1</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'><a href="https://arxiv.org/abs/2311.05997">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>CLOVA</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/abs/2312.10908">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>AgentTrust</td> <td align='center'>-</td> <td align='center'>Social Simulation</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2402.04559">Paper</a></td> <td align='center'><a href="https://www.camel-ai.org/research/agent-trust">Code</a></td> </tr> <tr> <td align='center'>embodied-agents</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>Robotics & Embodied AI</td> <td align='center'>-</td> <td align='center'><a href="https://github.com/mbodiai/embodied-agents">Code</a></td> </tr> <tr> <td align='center'>AgentOccam</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'>CS&SE</td> <td align='center'><a href="https://arxiv.org/pdf/2410.13825">Paper</a></td> <td align='center'>-</td> </tr> </table>

๐Ÿ“Š Evaluation on LLM-based Autonomous Agent

<table> <tr> <td align='center'>Model</td> <td align='center'>Subjective </td> <td align='center'>Objective </td> <td align='center'>Benchmark</td> <td align='center'>Paper</td> <td align='center'>Code</td> </tr> <tr> <td align='center'>WebShop</td> <td align='center'>-</td> <td align='center'>Environment Simulation;<br> Multi-task Evaluation </td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2207.01206">Paper</a></td> <td align='center'><a href="https://github.com/princeton-nlp/webshop">Code</a></td> </tr> <tr> <td align='center'>Social Simulacra</td> <td align='center'>Human Annotation</td> <td align='center'>Social Evaluation</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2208.04024">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>TE</td> <td align='center'>-</td> <td align='center'>Social Evaluation</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2208.10264">Paper</a></td> <td align='center'><a href="https://github.com/GatiAher/UsingLarge-Language-Models-to-Replicate-Human-Subject-Studies">Code</a></td> </tr> <tr> <td align='center'>LIBRO</td> <td align='center'>-</td> <td align='center'>Software Testing</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2209.11515">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>ReAct</td> <td align='center'>-</td> <td align='center'>Environment Simulation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2210.03629">Paper</a></td> <td align='center'><a href="https://github.com/ysymyth/ReAct">Code</a></td> </tr> <tr> <td align='center'>Out of One, Many</td> <td align='center'>Turing Test</td> <td align='center'>Social Evaluation;<br> Multi-task Evaluation</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2209.06899">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>DEPS</td> <td align='center'>-</td> <td align='center'>Environment Simulation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2302.01560">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Jalil et al.</td> <td align='center'>-</td> <td align='center'>Software Testing</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2302.03287">Paper</a></td> <td align='center'><a href="https://github.com/sajedjalil/ChatGPT-Software-Testing-Stud">Code</a></td> </tr> <tr> <td align='center'>Reflexion</td> <td align='center'>-</td> <td align='center'>Environment Simulation;<br> Multi-task Evaluation</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2303.11366">Paper</a></td> <td align='center'><a href="https://github.com/noahshinn024/reflexion">Code</a></td> </tr> <tr> <td align='center'>IGLU</td> <td align='center'>-</td> <td align='center'>Environment Simulation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2304.10750">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Generative Agents</td> <td align='center'> Human Annoation;<br>Turing Test</td> <td align='center'>-</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2304.03442">Paper</a></td> <td align='center'><a href="https://github.com/joonspk-research/generative_agents">Code</a></td> </tr> <tr> <td align='center'>ToolBench</td> <td align='center'>Human Annoation</td> <td align='center'>Multi-task Evalution</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2307.16789">Paper</a></td> <td align='center'><a href="https://github.com/OpenBMB/ToolBench">Code</a></td> </tr> <tr> <td align='center'>GITM</td> <td align='center'>-</td> <td align='center'>Environment Simulation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2305.17144">Paper</a></td> <td align='center'><a href="https://github.com/OpenGVLab/GITM">Code</a></td> </tr> <tr> <td align='center'>Two-Failures</td> <td align='center'>-</td> <td align='center'>Multi-task Evalution</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2305.14279">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Voyager</td> <td align='center'>-</td> <td align='center'>Environment Simulation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2305.16291">Paper</a></td> <td align='center'><a href="https://github.com/MineDojo/Voyager">Code</a></td> </tr> <tr> <td align='center'>SocKET</td> <td align='center'>-</td> <td align='center'>Social Evaluation;<br>Multi-task Evaluation </td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2305.14938">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>Mobile-Env</td> <td align='center'>-</td> <td align='center'>Environment Simulation;<br> Multi-task Evaluation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2305.08144">Paper</a></td> <td align='center'><a href="https://github.com/X-LANCE/Mobile-Env">Code</a></td> </tr> <tr> <td align='center'>Clembench</td> <td align='center'>-</td> <td align='center'>Environment Simulation;<br> Multi-task Evaluation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2305.13455">Paper</a></td> <td align='center'><a href="https://github.com/clp-research/clembench">Code</a></td> </tr> <tr> <td align='center'>Mind2Web</td> <td align='center'>-</td> <td align='center'>Environment Simulation;<br> Multi-task Evaluation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2306.06070">Paper</a></td> <td align='center'><a href="https://github.com/OSU-NLP-Group/Mind2Web">Code</a></td> </tr> <tr> <td align='center'>Dialop</td> <td align='center'>-</td> <td align='center'>Social Evaluation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2305.20076">Paper</a></td> <td align='center'><a href="https://github.com/jlin816/dialop">Code</a></td> </tr> <tr> <td align='center'>Feldt et al.</td> <td align='center'>-</td> <td align='center'>Software Testing</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2306.05152">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>CO-LLM</td> <td align='center'>Human Annoation</td> <td align='center'>Environment Simulation</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2307.02485">Paper</a></td> <td align='center'><a href="https://vis-www.cs.umass.edu/Co-LLM-Agents/">Code</a></td> </tr> <tr> <td align='center'>Tachikuma</td> <td align='center'>Human Annoation</td> <td align='center'>Environment Simulation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2307.12573">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>WebArena</td> <td align='center'>-</td> <td align='center'>Environment Simulation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2307.13854">Paper</a></td> <td align='center'><a href="https://github.com/web-arena-x/webarena">Code</a></td> </tr> <tr> <td align='center'>RocoBench</td> <td align='center'>-</td> <td align='center'>Environment Simulation;<br> Social Evaluation;<br> Multi-task Evaluation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2307.04738">Paper</a></td> <td align='center'><a href="https://project-roco.github.io/">Code</a></td> </tr> <tr> <td align='center'>AgentSims</td> <td align='center'>-</td> <td align='center'>Social Evaluation</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2308.04026">Paper</a></td> <td align='center'><a href="https://github.com/py499372727/AgentSims">Code</a></td> </tr> <tr> <td align='center'>AgentBench</td> <td align='center'>-</td> <td align='center'>Multi-task Evaluation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2308.03688">Paper</a></td> <td align='center'><a href="https://github.com/thudm/agentbench">Code</a></td> </tr> <tr> <td align='center'>BOLAA</td> <td align='center'>-</td> <td align='center'>Environment Simulation;<br>Multi-task Evaluation;<br>Software Testing</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2308.05960">Paper</a></td> <td align='center'><a href="https://github.com/salesforce/BOLAA">Code</a></td> </tr> <tr> <td align='center'>Gentopia</td> <td align='center'>-</td> <td align='center'>Isolated Reasoning;<br> Multi-task Evaluation </td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2308.04030">Paper</a></td> <td align='center'><a href="https://github.com/Gentopia-AI/Gentopia">Code</a></td> </tr> <tr> <td align='center'>EmotionBench</td> <td align='center'>Human Annotation</td> <td align='center'>-</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2308.03656">Paper</a></td> <td align='center'><a href="https://github.com/CUHK-ARISE/EmotionBench">Code</a></td> </tr> <tr> <td align='center'>PTB</td> <td align='center'>-</td> <td align='center'>Software Testing </td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2308.06782">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>MintBench</td> <td align='center'>-</td> <td align='center'>Multi-task Evaluation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2309.10691">Paper</a></td> <td align='center'><a href="https://github.com/xingyaoww/mint-bench">Code</a></td> </tr> <tr> <td align='center'>MindAgent</td> <td align='center'>-</td> <td align='center'>Environment Simulation;<br>Multi-task Evaluation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2309.09971">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>JARVIS-1</td> <td align='center'>-</td> <td align='center'>Environment Simulation</td> <td align='center'>-</td> <td align='center'><a href="https://arxiv.org/abs/2311.05997">Paper</a></td> <td align='center'>-</td> </tr> <tr> <td align='center'>TimeCharac</td> <td align='center'>GPT Annotation</td> <td align='center'>-</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2405.18027">Paper</a></td> <td align='center'><a href="https://ahnjaewoo.github.io/timechara/">Code</a></td> </tr> <tr> <td align='center'>AppWorld</td> <td align='center'>-</td> <td align='center'>Environment Simulation</td> <td align='center'>&check;</td> <td align='center'><a href="https://arxiv.org/abs/2407.18901">Paper</a></td> <td align='center'><a href="https://github.com/stonybrooknlp/appworld/">Code</a></td> </tr> </table> <hr>

๐ŸŒ More Comprehensive Summarization

We are maintaining an interactive table that contains more comprehensive papers related to LLM-based Agents. This table includes details such as tags, authors, publication date, and more, allowing you to sort, filter, and find the papers of interest to you. Complete Table

๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Maintainers

๐Ÿ“š Citation

If you find this survey useful, please cite our paper:

@misc{wang2023survey,
      title={A Survey on Large Language Model based Autonomous Agents}, 
      author={Lei Wang and Chen Ma and Xueyang Feng and Zeyu Zhang and Hao Yang and Jingsen Zhang and Zhiyuan Chen and Jiakai Tang and Xu Chen and Yankai Lin and Wayne Xin Zhao and Zhewei Wei and Ji-Rong Wen},
      year={2023},
      eprint={2308.11432},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

๐Ÿ’ช How to Contribute

If you have a paper or are aware of relevant research that should be incorporated, please contribute via pull requests, issues, email, or other suitable methods.

๐Ÿซก Acknowledgement

We thank the following people for their valuable suggestions and contributions to this survey:

๐Ÿ“ง Contact Us

If you have any questions or suggestions, please contact us via: