Awesome
🤖 Awesome-Embodied-Agent-with-LLMs
This is a curated list of "Embodied AI or agent with Large Language Models" research which is maintained by haonan.
<img src="https://user-images.githubusercontent.com/7837172/44953557-0fb54e80-aec9-11e8-9d38-2388bc70c5c5.png" width=15% align="right" />
Watch this repository for the latest updates and feel free to raise pull requests if you find some interesting papers!
News🔥
[2024/08/01] Created a new board about social agent and role-playing. 🧑🧑🧒🧒 <br> [2024/06/28] Created a new board about agent self-evolutionary research. 🤖 <br> [2024/06/07] Add Mobile-Agent-v2, a mobile device operation assistant with effective navigation via multi-agent collaboration. 🚀 <br> [2024/05/13] Add "Learning Interactive Real-World Simulators"——outstanding paper award in ICLR 2024 🥇.<br> [2024/04/24] Add "A Survey on Self-Evolution of Large Language Models", a systematic survey on self-evolution in LLMs! 💥<br> [2024/04/16] Add some CVPR 2024 papers. <br> [2024/04/15] Add MetaGPT, accepted for oral presentation (top 1.2%) at ICLR 2024, ranking #1 in the LLM-based Agent category. 🚀 <br> [2024/03/13] Add CRADLE, an interesting paper exploring LLM-based agent in Red Dead Redemption II!🎮
Table of Contents 🍃
- Survey
- Social Agent
- Self-Evolving Agents
- Advanced Agent Applications
- LLMs with RL or World Model
- Planning and Manipulation or Pretraining
- Multi-Agent Learning and Coordination
- Vision and Language Navigation
- Detection
- 3D Grounding
- Interactive Embodied Learning
- Rearrangement
- Benchmark
- Simulator
- Others
Trend and Imagination of LLM-based Embodied Agent
<p align="center"> <img src="trend.png" width="54%"> <img src="Genshin.jpg" width="43%"> <span><b>Figure 1. Trend of Embodied Agent with LLMs.<sup>[1]</sup></b></span> <span><b>Figure 2. An envisioned Agent society.<sup>[2]</sup></b></span> </p>Methods
Survey
-
A Survey on Vision-Language-Action Models for Embodied AI [arXiv 2024.03]<br> The Chinese University of Hong Kong, Huawei Noah’s Ark Lab
-
Large Multimodal Agents: A Survey [arXiv 2024.02] [Github]<br> Junlin Xie<sup>♣♡</sup> Zhihong Chen<sup>♣♡</sup> Ruifei Zhang<sup>♣♡</sup> Xiang Wan<sup>♣</sup> Guanbin Li<sup>♠</sup><br> <sup>♡</sup>The Chinese University of Hong Kong, Shenzhen <sup>♣</sup>Shenzhen Research Institute of Big Data, <sup>♠</sup>Sun Yat-sen University
-
A Survey on Self-Evolution of Large Language Models [arXiv 2024.01]<br> Key Lab of HCST (PKU), MOE; School of Computer Science, Peking University, Alibaba Group, Nanyang Technological University
-
Agent AI: Surveying the Horizons of Multimodal Interaction [arXiv 2024.01]<br> Stanford University, Microsoft Research, Redmond, University of California, Los Angeles, University of Washington, Microsoft Gaming
-
Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents [arXiv 2023.11]<br> Shanghai Jiao Tong University, Amazon Web Services, Yale University
-
The Rise and Potential of Large Language Model Based Agents: A Survey [arXiv 2023.09]<br> Fudan NLP Group, miHoYo Inc
-
A Survey on LLM-based Autonomous Agents [arXiv 2023,08] <br> Gaoling School of Artificial Intelligence, Renmin University of China
Social Agent
Self-Evolving Agents
-
AGENTGYM: Evolving Large Language Model-based Agents across Diverse Environments [arXiv 2024.06] [Github] [Project page] <br> Fudan NLP Lab & Fudan Vision and Learning Lab
-
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models [arXiv 2024.06] [Github]<br> Fangzhi Xu<sup>♢♡</sup>, Qiushi Sun<sup>2, ♡</sup>, Kanzhi Cheng<sup>1</sup>, Jun Liu<sup>♢</sup>, Yu Qiao♡, Zhiyong Wu<sup>♡</sup> <br> <sup>♢</sup>Xi’an Jiaotong University, <sup>♡</sup>Shanghai Artificial Intelligence Laboratory, <sup>1</sup>The University of Hong Kong, <sup>2</sup>Nanjing Univerisity
-
Symbolic Learning Enables Self-Evolving Agents [arXiv 2024.06] [Github]<br> Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang<br> AIWaves Inc.
Advanced Agent Applications
-
[Embodied-agents] [Github] <br> Seamlessly integrate state-of-the-art transformer models into robotics stacks.
-
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [arXiv 2024] [Github]<br> Junyang Wang<sup>1</sup>, Haiyang Xu<sup>2</sup>, Haitao Jia<sup>1</sup>, Xi Zhang<sup>2</sup>, Ming Yan<sup>2</sup>, Weizhou Shen<sup>2</sup>, Ji Zhang<sup>2</sup>, Fei Huang<sup>2</sup>, Jitao Sang<sup>1</sup><br> <sup>1</sup>Beijing Jiaotong University <sup>2</sup>Alibaba Group
-
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family [ICLR 2024 Workshop LLM Agents] [Github]<br> Junyang Wang<sup>1</sup>, Haiyang Xu<sup>2</sup>, Jiabo Ye<sup>2</sup>, Ming Yan<sup>2</sup>, Weizhou Shen<sup>2</sup>, Ji Zhang<sup>2</sup>, Fei Huang<sup>2</sup>, Jitao Sang<sup>1</sup><br> <sup>1</sup>Beijing Jiaotong University <sup>2</sup>Alibaba Group
-
[Machinascript-for-robots] [Github] <br> Build LLM-powered robots in your garage with MachinaScript For Robots!
-
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model [CVPR 2024] [Github] <br> Lirui Zhao<sup>1,2</sup> Yue Yang<sup>2,4</sup> Kaipeng Zhang<sup>2</sup> Wenqi Shao<sup>2</sup>, Yuxin Zhang<sup>1</sup>, Yu Qiao<sup>2</sup>, Ping Luo<sup>2,3</sup> Rongrong Ji<sup>1</sup><br> <sup>1</sup>Xiamen University, <sup>2</sup>OpenGVLab, Shanghai AI Laboratory <sup>3</sup>The University of Hong Kong, <sup>4</sup>Shanghai Jiao Tong University
-
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework [ICLR 2024 (oral)]<br> DeepWisdom, AI Initiative, King Abdullah University of Science and Technology, Xiamen University, The Chinese University of Hong Kong, Shenzhen, Nanjing University, University of Pennsylvania, University of California, Berkeley, The Swiss AI Lab IDSIA/USI/SUPSI
-
AppAgent: Multimodal Agents as Smartphone Users [Project page] [Github] <br> Chi Zhang∗ ZhaoYang∗ JiaxuanLiu∗ YuchengHan XinChen Zebiao Huang BinFu GangYu†<br> Tencent
LLMs with RL or World Model
-
KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts [NeurIPS 2024] [Project Page]<br> Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu<br> <sup>1</sup>Nanjing University, <sup>2</sup>Polixir.ai
-
Learning Interactive Real-World Simulators [ICLR 2024 (Outstanding Papers)] [Project Page]<br> Sherry Yang<sup>1,2</sup>, Yilun Du<sup>3</sup>, Kamyar Ghasemipour<sup>2</sup>, Jonathan Tompson<sup>2</sup>, Leslie Kaelbling<sup>3</sup>, Dale Schuurmans<sup>2</sup>, Pieter Abbeel<sup>1</sup><br> <sup>1</sup>UC Berkeley, <sup>2</sup>Google DeepMind, <sup>3</sup>MIT
-
Robust agents learn causal world models [ICLR 2024]<br> Jonathan Richens*, TomEveritt <br> Google DeepMind
-
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld [CVPR 2024] [Github]<br> Yijun Yang<sup>154</sup>, Tianyi Zhou<sup>2</sup>, Kanxue Li<sup>3</sup>, Dapeng Tao<sup>3</sup>, Lvsong Li<sup>4</sup>, Li Shen<sup>4</sup>, Xiaodong He<sup>4</sup>, Jing Jiang<sup>5</sup>, Yuhui Shi<sup>1</sup><br> <sup>1</sup>Southern University of Science and Technology, <sup>2</sup>University of Maryland, College Park, <sup>3</sup>Yunnan University, <sup>4</sup>JD Explore Academy, <sup>5</sup>University of Technology Sydney
-
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning [NeurIPS 2023] [Project Page][Github]<br> Lin_Guan<sup>1</sup>, Karthik Valmeekam<sup>1</sup>, Sarath Sreedharan<sup>2</sup>, Subbarao Kambhampati<sup>1</sup><br> <sup>1</sup>School of Computing & AI Arizona State University Tempe, AZ, <sup>2</sup>Department of Computer Science Colorado State University Fort Collins, CO
-
Eureka: Human-Level Reward Design via Coding Large Language Models [NeurIPS 2023 Workshop ALOE Spotlight] [Project page] [Github] <br> Jason Ma<sup>1,2</sup>, William Liang<sup>2</sup>, Guanzhi Wang<sup>1,3</sup>, De-An Huang<sup>1</sup>, Osbert Bastani<sup>2</sup>, Dinesh Jayaraman<sup>2</sup>, Yuke Zhu<sup>1,4</sup>, Linxi "Jim" Fan<sup>1</sup>, Anima Anandkumar<sup>1,3</sup><br> <sup>1</sup>NVIDIA; <sup>2</sup>UPenn; <sup>3</sup>Caltech; <sup>4</sup>UT Austin
-
RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds [arXiv 2023] <br>
-
Can Language Agents Be Alternatives to PPO? A Preliminary Empirical Study on OpenAI Gym [arXiv 2023] <br>
-
RoboGPT: An intelligent agent of making embodied long-term decisions for daily instruction tasks [arXiv 2023] <br>
-
Aligning Agents like Large Language Models [arXiv 2023] <br>
-
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents [ICLR 2024 spotlight] <br>
-
STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models [arXiv 2023] <br>
-
Text2Reward: Dense Reward Generation with Language Models for Reinforcement Learning [ICLR 2024 spotlight] <br>
-
Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning [arXiv 2023] <br>
-
Online Continual Learning for Interactive Instruction Following Agents [ICLR 2024] <br>
-
ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning [arXiv 2023] <br>
-
Language Reward Modulation for Pretraining Reinforcement Learning [arXiv 2023] <br>
-
Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes [arXiv 2023] <br>
-
Learning to Model the World with Language [arXiv 2023] <br>
-
MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning [ICLR 2024] <br>
-
Language Reward Modulation for Pretraining Reinforcement Learning [arXiv 2023] [Github]<br> Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel<br> <sup>1</sup>UC Berkeley
-
Guiding Pretraining in Reinforcement Learning with Large Language Models [ICML 2023] <br> Yuqing Du<sup>1*</sup>, Olivia Watkins<sup>1*</sup>, Zihan Wang<sup>2</sup>, Cedric Colas ´<sup>3,4</sup>, Trevor Darrell<sup>1</sup>, Pieter Abbeel<sup>1</sup>, Abhishek Gupta<sup>2</sup>, Jacob Andreas<sup>3</sup><br> <sup>1</sup>Department of Electrical Engineering and Computer Science, University of California, Berkeley, USA <sup>2</sup>University of Washington, Seattle <sup>3</sup>Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory <sup>4</sup> Inria, Flowers Laboratory.
Planning and Manipulation or Pretraining
-
Voyager: An Open-Ended Embodied Agent with Large Language Models [NeurIPS 2023 Workshop ALOE Spotlight] [Project page] [Github] <br> Guanzhi Wang<sup>1,2</sup>, Yuqi Xie<sup>3</sup>, Yunfan Jiang<sup>4</sup>, Ajay Mandlekar<sup>1</sup>, Chaowei Xiao<sup>1,5</sup>, Yuke Zhu<sup>1,3</sup>, Linxi Fan<sup>1</sup>, Anima Anandkumar<sup>1,2</sup> <sup>1</sup>NVIDIA, <sup>2</sup>Caltech, <sup>3</sup>UT Austin, <sup>4</sup>Stanford, <sup>5</sup>UW Madison
-
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization [ACL 2024][Github] <br> Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu
-
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives [ACL 2024] <br> Wenqi Zhang, Yongliang Shen, Linjuan Wu, Qiuying Peng, Jun Wang, Yueting Zhuang, Weiming Lu
-
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control [arXiv 2024] [Project Page] <br> Enshen Zhou<sup>1,2</sup> Yiran Qin<sup>1,3</sup> Zhenfei Yin<sup>1,4</sup> Yuzhou Huang<sup>3</sup> Ruimao Zhang<sup>3</sup> Lu Sheng<sup>2</sup> Yu Qiao<sup>1</sup> Jing Shao<sup>1</sup><br> <sup>1</sup>Shanghai Artificial Intelligence Laboratory, <sup>2</sup>The Chinese University of Hong Kong, Shenzhen, <sup>3</sup>Beihang University, <sup>4</sup>The University of Sydney
-
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception [CVPR 2024] [Project Page] <br> Yiran Qin<sup>1,2</sup> Enshen Zhou<sup>1,3</sup> Qichang Liu<sup>1,4</sup> Zhenfei Yin<sup>1,5</sup> Lu Sheng<sup>3</sup> Ruimao Zhang<sup>2</sup> Yu Qiao<sup>1</sup> Jing Shao<sup>1</sup><br> <sup>1</sup>Shanghai Artificial Intelligence Laboratory, <sup>2</sup>The Chinese University of Hong Kong, Shenzhen, <sup>3</sup>Beihang University, <sup>4</sup>Tsinghua University, <sup>5</sup>The University of Sydney
-
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation [CVPR 2024] <br> Zeyuan Yang<sup>1</sup>, LIU JIAGENG, Peihao Chen<sup>2</sup>, Anoop Cherian<sup>3</sup>, Tim Marks, Jonathan Le Roux<sup>4</sup>, Chuang Gan<sup>5</sup> <sup>1</sup>Tsinghua University, <sup>2</sup>South China University of Technology, <sup>3</sup>Mitsubishi Electric Research Labs (MERL), <sup>4</sup>Mitsubishi Electric Research Labs, <sup>5</sup>MIT-IBM Watson AI Lab
-
Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study [arXiv 2024] [Project Page] [Code] <br> Weihao Tan<sup>2</sup>, Ziluo Ding<sup>1</sup>, Wentao Zhang<sup>2</sup>, Boyu Li<sup>1</sup>, Bohan Zhou<sup>3</sup>, Junpeng Yue<sup>3</sup>, Haochong Xia<sup>2</sup>, Jiechuan Jiang<sup>3</sup>, Longtao Zheng<sup>2</sup>, Xinrun Xu1, Yifei Bi<sup>1</sup>, Pengjie Gu<sup>2</sup>,<br> <sup>1</sup>Beijing Academy of Artificial Intelligence (BAAI), China; <sup>2</sup>Nanyang Technological University, Singapore; <sup>3</sup>School of Computer Science, Peking University, China
-
See and Think: Embodied Agent in Virtual Environment [arXiv 2023] <br> Zhonghan Zhao<sup>1*</sup>, Wenhao Chai<sup>2*</sup>, Xuan Wang<sup>1*</sup>, Li Boyi<sup>1</sup>, Shengyu Hao<sup>1</sup>, Shidong Cao<sup>1</sup>, Tian Ye<sup>3</sup>, Jenq-Neng Hwang<sup>2</sup>, Gaoang Wang<sup>1</sup><br> <sup>1</sup>Zhejiang University <sup>1</sup>University of Washington <sup>1</sup>Hong Kong University of Science and Technology (GZ)
-
Agent Instructs Large Language Models to be General Zero-Shot Reasoners [arXiv 2023] <br> Nicholas Crispino<sup>1</sup>, Kyle Montgomery<sup>1</sup>, Fankun Zeng<sup>1</sup>, Dawn Song<sup>2</sup>, Chenguang Wang<sup>1</sup><br> <sup>1</sup>Washington University in St. Louis, <sup>2</sup>UC Berkeley
-
JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models [NeurIPS 2023] [Project Page] <br> Zihao Wang<sup>1,2</sup> Shaofei Cai<sup>1,2</sup> Anji Liu<sup>3</sup> Yonggang Jin<sup>4</sup> Jinbing Hou<sup>4</sup> Bowei Zhang<sup>5</sup> Haowei Lin<sup>1,2</sup> Zhaofeng He<sup>4</sup> Zilong Zheng<sup>6</sup> Yaodong Yang<sup>1</sup> Xiaojian Ma<sup>6†</sup> Yitao Liang<sup>1†</sup><br> <sup>1</sup>Institute for Artificial Intelligence, Peking University, <sup>2</sup>School of Intelligence Science and Technology, Peking University, <sup>3</sup>Computer Science Department, University of California, Los Angeles, <sup>4</sup>Beijing University of Posts and Telecommunications, <sup>5</sup>School of Electronics Engineering and Computer Science, Peking University, <sup>6</sup>Beijing Institute for General Artificial Intelligence (BIGAI)
-
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents [NeurIPS 2023]<br> Zihao Wang<sup>1,2</sup> Shaofei Cai<sup>1,2</sup> Guanzhou Chen<sup>3</sup> Anji Liu<sup>4</sup> Xiaojian Ma<sup>4</sup> Yitao Liang<sup>1,5†</sup><br> <sup>1</sup>Institute for Artificial Intelligence, Peking University, <sup>2</sup>School of Intelligence Science and Technology, Peking University, <sup>3</sup>School of Computer Science, Beijing University of Posts and Telecommunications, <sup>4</sup>Computer Science Department, University of California, Los Angeles, <sup>5</sup>Beijing Institute for General Artificial Intelligence (BIGAI)
-
CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society [NeurIPS 2023] [Github] [Project page]<br> Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem<br> <sup>1</sup>King Abdullah University of Science and Technology (KAUST)
-
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents [arXiv 2022] [Github] [Project page] <br> Wenlong Huang<sup>1</sup>, Pieter Abbeel<sup>1</sup>, Deepak Pathak<sup>2</sup>, Igor Mordatch<sup>3</sup><br> <sup>1</sup>UC Berkeley, <sup>2</sup>Carnegie Mellon University, <sup>3</sup>Google
-
FILM: Following Instructions in Language with Modular Methods [ICLR 2022] [Github] [Project page] <br> So Yeon Min<sup>1</sup>, Devendra Singh Chaplot<sup>2</sup>, Pradeep Ravikumar<sup>1</sup>, Yonatan Bisk<sup>1</sup>, Ruslan Salakhutdinov<sup>1</sup><br> <sup>1</sup>Carnegie Mellon University <sup>2</sup>Facebook AI Research
-
Embodied Task Planning with Large Language Models [arXiv 2023] [Github] [Project page] [Demo] [Huggingface Model] <br> Zhenyu Wu<sup>1</sup>, Ziwei Wang<sup>2,3</sup>, Xiuwei Xu<sup>2,3</sup>, Jiwen Lu<sup>2,3</sup>, Haibin Yan<sup>1*</sup><br> <sup>1</sup>School of Automation, Beijing University of Posts and Telecommunications, <sup>2</sup>Department of Automation, Tsinghua University, <sup>3</sup>Beijing National Research Center for Information Science and Technology
-
SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning [arXiv 2023] <br> Yue Wu<sup>1,4*</sup> , Shrimai Prabhumoye<sup>2</sup> , So Yeon Min<sup>1</sup> , Yonatan Bisk<sup>1</sup> , Ruslan Salakhutdinov<sup>1</sup> ,Amos Azaria<sup>3</sup> , Tom Mitchell<sup>1</sup> , Yuanzhi Li<sup>1,4</sup><br> <sup>1</sup>Carnegie Mellon University, <sup>2</sup>NVIDIA, <sup>3</sup>Ariel University, <sup>4</sup>Microsoft Research
-
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning [CVPR 2022 (Oral)] [Project page] [Github] <br> Santhosh Kumar Ramakrishnan<sup>1,2</sup>, Devendra Singh Chaplot<sup>1</sup>, Ziad Al-Halah<sup>2</sup> Jitendra Malik<sup>1,3</sup>, Kristen Grauman<sup>1,2</sup><br> <sup>1</sup>Facebook AI Research, <sup>2</sup>UT Austin, <sup>3</sup>UC Berkeley
-
Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics [ICLR 2023] [Project page] [Github] <br> Kuo-Hao Zeng<sup>1</sup>, Luca Weihs<sup>2</sup>, Roozbeh Mottaghi<sup>1</sup>, Ali Farhadi<sup>1</sup><br> <sup>1</sup>Paul G. Allen School of Computer Science & Engineering, University of Washington, <sup>2</sup>PRIOR @ Allen Institute for AI
-
Modeling Dynamic Environments with Scene Graph Memory [ICML 2023] <br> Andrey Kurenkov<sup>1</sup>, Michael Lingelbach<sup>1</sup>, Tanmay Agarwal<sup>1</sup>, Emily Jin<sup>1</sup>, Chengshu Li<sup>1</sup>, Ruohan Zhang<sup>1</sup>, Li Fei-Fei<sup>1</sup>, Jiajun Wu<sup>1</sup>, Silvio Savarese<sup>2</sup>, Roberto Mart´ın-Mart´ın<sup>3</sup><br> <sup>1</sup>Department of Computer Science, Stanford University <sup>2</sup>Salesforce AI Research <sup>3</sup>Department of Computer Science, University of Texas at Austin.
-
Reasoning with Language Model is Planning with World Model [arXiv 2023] <br> Shibo Hao<sup>∗♣</sup>, Yi Gu<sup>∗♣</sup>, Haodi Ma<sup>♢</sup>, Joshua Jiahua Hong<sup>♣</sup>, Zhen Wang<sup>♣ ♠</sup>, Daisy Zhe Wang<sup>♢</sup>, Zhiting Hu<sup>♣</sup><br> <sup>♣</sup>UC San Diego, <sup>♢</sup>University of Florida, <sup>♠</sup>Mohamed bin Zayed University of Artificial Intelligence
-
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [arXiv 2022]<br> Robotics at Google, Everyday Robots
-
Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling [ICML 2023]<br> Kolby Nottingham<sup>1</sup> Prithviraj Ammanabrolu<sup>2</sup> Alane Suhr<sup>2</sup> Yejin Choi<sup>3,2</sup> Hannaneh Hajishirzi<sup>3,2</sup> Sameer Singh<sup>1,2</sup> Roy Fox<sup>1</sup><br> <sup>1</sup>Department of Computer Science, University of California Irvine <sup>2</sup>Allen Institute for Artificial Intelligence <sup>3</sup>Paul G. Allen School of Computer Science
-
Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents [ICCV 2023]<br> Byeonghwi Kim Jinyeon Kim Yuyeong Kim<sup>1,*</sup> Cheolhong Min Jonghyun Choi<sup>†</sup><br> Yonsei University <sup>1</sup>Gwangju Institute of Science and Technology
-
Inner Monologue: Embodied Reasoning through Planning with Language Models [CoRL 2022] [Project page]<br> Robotics at Google
-
Language Models Meet World Models: Embodied Experiences Enhance Language Models [arXiv 2023] [Twitter]<br> Jiannan Xiang<sup>∗♠</sup>, Tianhua Tao<sup>∗♠</sup>, Yi Gu<sup>♠</sup>, Tianmin Shu<sup>♢</sup>, Zirui Wang<sup>♠</sup>, Zichao Yang<sup>♡</sup>, Zhiting Hu<sup>♠</sup><br> <sup>♠</sup>UC San Diego, <sup>♣</sup>UIUC, <sup>♢</sup>MIT, <sup>♡</sup>Carnegie Mellon University
-
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation [arXiv 2023] [Video]<br> Chuhao Jin<sup>1*</sup> , Wenhui Tan<sup>1*</sup> , Jiange Yang<sup>2*</sup> , Bei Liu3<sup>†</sup> , Ruihua Song<sup>1</sup> , Limin Wang<sup>2</sup> , Jianlong Fu<sup>3†</sup><br> <sup>1</sup>Renmin University of China, <sup>2</sup>Nanjing University, <sup>3</sup>Microsoft Research
-
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution [CoRL 2021] [Project page] [Poster]<br> Valts Blukis<sup>1,2</sup>, Chris Paxton<sup>1</sup>, Dieter Fox<sup>1,3</sup>, Animesh Garg<sup>1,4</sup>, Yoav Artzi<sup>2</sup><br> <sup>1</sup>NVIDIA <sup>2</sup>Cornell University <sup>3</sup>University of Washington <sup>4</sup>University of Toronto, Vector Institute
-
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models [ICCV 2023] [Project page] [Github]<br> Chan Hee Song<sup>1</sup>, Jiaman Wu<sup>1</sup>, Clayton Washington<sup>1</sup>, Brian M. Sadler<sup>2</sup>, Wei-Lun Chao<sup>1</sup>, Yu Su<sup>1</sup><br> <sup>1</sup>The Ohio State University, <sup>2</sup>DEVCOM ARL
-
Code as Policies: Language Model Programs for Embodied Control [arXiv 2023] [Project page] [Github] [Blog] [Colab]<br> Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng<br> Robotics at Google
-
3D-LLM: Injecting the 3D World into Large Language Models [arXiv 2023] <br> <sup>1</sup>Yining Hong, <sup>2</sup>Haoyu Zhen, <sup>3</sup>Peihao Chen, <sup>4</sup>Shuhong Zheng, <sup>5</sup>Yilun Du, <sup>6</sup>Zhenfang Chen, <sup>6,7</sup>Chuang Gan <br> <sup>1</sup>UCLA <sup>2</sup> SJTU <sup>3</sup> SCUT <sup>4</sup> UIUC <sup>5</sup> MIT <sup>6</sup>MIT-IBM Watson AI Lab <sup>7</sup> Umass Amherst
-
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models [arXiv 2023] [Project page] [Online Demo]<br> Wenlong Huang<sup>1</sup>, Chen Wang<sup>1</sup>, Ruohan Zhang<sup>1</sup>, Yunzhu Li<sup>1,2</sup>, Jiajun Wu<sup>1</sup>, Li Fei-Fei<sup>1</sup> <br> <sup>1</sup>Stanford University <sup>2</sup>University of Illinois Urbana-Champaign
-
Palm-e: An embodied multimodal language mode [ICML 2023] [Project page]<br> <sup>1</sup>Robotics at Google <sup>2</sup>TU Berlin 3Google Research
-
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning [arXiv 2023] <br> Zirui Zhao Wee Sun Lee David Hsu <br> School of Computing National University of Singapore
-
An Embodied Generalist Agent in 3D World [ICML 2024] <br> Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang Beijing Institute for General Artificial Intelligence (BIGAI)
Multi-Agent Learning and Coordination
-
Building Cooperative Embodied Agents Modularly with Large Language Models [ICLR 2024] [Project page] [Github]<br> Hongxin Zhang<sup>1*</sup>, Weihua Du<sup>2*</sup>, Jiaming Shan<sup>3</sup>, Qinhong Zhou<sup>1</sup>, Yilun Du<sup>4</sup>, Joshua B. Tenenbaum<sup>4</sup>, Tianmin Shu<sup>4</sup>, Chuang Gan<sup>1,5</sup><br> <sup>1</sup>University of Massachusetts Amherst, <sup>2</sup>Tsinghua University, <sup>3</sup>Shanghai Jiao Tong University, <sup>4</sup>MIT, <sup>5</sup>MIT-IBM Watson AI Lab
-
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars [arXiv 2023]<br> Wenyue Hua<sup>1*</sup>, Lizhou Fan<sup>2*</sup>, Lingyao Li<sup>2</sup>, Kai Mei<sup>1</sup>, Jianchao Ji<sup>1</sup>, Yingqiang Ge<sup>1</sup>, Libby Hemphill<sup>2</sup>, Yongfeng Zhang<sup>1</sup><br> <sup>1</sup>Rutgers University, <sup>2</sup>University of Michigan
-
MindAgent: Emergent Gaming Interaction [arXiv 2023]<br> Ran Gong<sup>*1†</sup> Qiuyuan Huang<sup>*2‡</sup> Xiaojian Ma<sup>*1</sup> Hoi Vo<sup>3</sup> Zane Durante<sup>†4</sup> Yusuke Noda<sup>3</sup> Zilong Zheng<sup>5</sup> Song-Chun Zhu<sup>15678</sup> Demetri Terzopoulos<sup>1</sup> Li Fei-Fei<sup>4</sup> Jianfeng Gao<sup>2</sup><br><sup>1</sup>UCLA; <sup>2</sup>Microsoft Research, Redmond; <sup>3</sup>Xbox Team, Microsoft; <sup>4</sup>Stanford; <sup>5</sup>BIGAI; <sup>6</sup>PKU; <sup>7</sup>THU; <sup>8</sup>UCLA
-
Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum [ICML 2023]<br> Jigang Kim<sup>*1,2</sup> Daesol Cho<sup>*1,2</sup> H. Jin Kim<sup>1,3</sup><br> <sup>1</sup>Seoul National University, <sup>2</sup>Artificial Intelligence Institute of Seoul National University (AIIS), <sup>3</sup>Automation and Systems Research Institute (ASRI).<br> Note: This paper mainly focuses on reinforcement learning for Embodied AI.
-
Adaptive Coordination in Social Embodied Rearrangement [ICML 2023]<br> Andrew Szot<sup>1,2</sup> Unnat Jain<sup>1</sup> Dhruv Batra<sup>1,2</sup> Zsolt Kira<sup>2</sup> Ruta Desai<sup>1</sup> Akshara Rai<sup>1</sup><br> <sup>1</sup>Meta AI <sup>2</sup>Georgia Institute of Technology.
Vision and Language Navigation
-
IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience [arXiv 2023] <br> Joanne Truong<sup>1,2</sup>, April Zitkovich<sup>1</sup>, Sonia Chernova<sup>2</sup>, Dhruv Batra<sup>2,3</sup>, Tingnan Zhang<sup>1</sup>, Jie Tan<sup>1</sup>, Wenhao Yu<sup>1</sup><br> <sup>1</sup>Robotics at Google <sup>2</sup>Georgia Institute of Technology <sup>3</sup>Meta AI
-
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation [ICML 2023] <br> Kaiwen Zhou<sup>1</sup>, Kaizhi Zheng<sup>1</sup>, Connor Pryor<sup>1</sup>, Yilin Shen<sup>2</sup>, Hongxia Jin<sup>2</sup>, Lise Getoor<sup>1</sup>, Xin Eric Wang<sup>1</sup><br> <sup>1</sup>University of California, Santa Cruz <sup>2</sup>Samsung Research America.
-
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models [arXiv 2023] <br> Gengze Zhou<sup>1</sup> Yicong Hong<sup>2</sup> Qi Wu<sup>1</sup> <br> <sup>1</sup>The University of Adelaide <sup>2</sup>The Australian National University
-
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [arXiv 2023] [Github]
Siyuan Huang<sup>1,2</sup> Zhengkai Jiang<sup>4</sup> Hao Dong<sup>3</sup> Yu Qiao<sup>2</sup> Peng Gao<sup>2</sup> Hongsheng Li<sup>5</sup> <br> <sup>1</sup>Shanghai Jiao Tong University, <sup>2</sup>Shanghai AI Laboratory, <sup>3</sup>CFCS, School of CS, PKU, <sup>4</sup>University of Chinese Academy of Sciences, <sup>5</sup>The Chinese University of Hong Kong
Detection
- DetGPT: Detect What You Need via Reasoning [arXiv 2023] <br> Renjie Pi<sup>1∗</sup> Jiahui Gao<sup>2*</sup> Shizhe Diao<sup>1∗</sup> Rui Pan<sup>1</sup> Hanze Dong<sup>1</sup> Jipeng Zhang<sup>1</sup> Lewei Yao<sup>1</sup> Jianhua Han<sup>3</sup> Hang Xu<sup>2</sup> Lingpeng Kong<sup>2</sup> Tong Zhang<sup>1</sup> <br> <sup>1</sup>The Hong Kong University of Science and Technology <sup>2</sup>The University of Hong Kong 3Shanghai Jiao Tong University
3D Grounding
-
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent [arXiv 2023] <br> Jianing Yang<sup>1,</sup>, Xuweiyi Chen<sup>1,</sup>, Shengyi Qian<sup>1</sup>, Nikhil Madaan, Madhavan Iyengar<sup>1</sup>, David F. Fouhey<sup>1,2</sup>, Joyce Chai<sup>1</sup><br> <sup>1</sup>University of Michigan, <sup>2</sup>New York University
-
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment [ICCV 2023] <br> Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, Qing Li Beijing Institute for General Artificial Intelligence (BIGAI)
Interactive Embodied Learning
-
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning [ICML 2023] <br> Thomas Carta<sup>1*</sup>, Clement Romac ´<sup>1,2</sup>, Thomas Wolf<sup>2</sup>, Sylvain Lamprier<sup>3</sup>, Olivier Sigaud<sup>4</sup>, Pierre-Yves Oudeyer<sup>1</sup><br> <sup>1</sup>Inria (Flowers), University of Bordeaux, <sup>2</sup>Hugging Face, <sup>3</sup>Univ Angers, LERIA, SFR MATHSTIC, F-49000, <sup>4</sup>Sorbonne University, ISIR
-
Learning Affordance Landscapes for Interaction Exploration in 3D Environments [NeurIPS 2020] [Project page] <br> Tushar Nagarajan, Kristen Grauman<br> UT Austin and Facebook AI Research, UT Austin and Facebook AI Research
-
Embodied Question Answering in Photorealistic Environments with Point Cloud Perception [CVPR 2019 (oral)] [Slides]<br> Erik Wijmans<sup>1†</sup>, Samyak Datta<sup>1</sup>, Oleksandr Maksymets<sup>2†</sup>, Abhishek Das<sup>1</sup>, Georgia Gkioxari<sup>2</sup>, Stefan Lee<sup>1</sup>, Irfan Essa<sup>1</sup>, Devi Parikh<sup>1,2</sup>, Dhruv Batra<sup>1,2</sup> <br> <sup>1</sup>Georgia Institute of Technology, <sup>2</sup>Facebook AI Research
-
Multi-Target Embodied Question Answering [CVPR 2019] <br> Licheng Yu<sup>1</sup>, Xinlei Chen<sup>3</sup>, Georgia Gkioxari<sup>3</sup>, Mohit Bansal<sup>1</sup>, Tamara L. Berg<sup>1,3</sup>, Dhruv Batra<sup>2,3</sup><br> <sup>1</sup>University of North Carolina at Chapel Hill <sup>2</sup>Georgia Tech 3Facebook AI
-
Neural Modular Control for Embodied Question Answering [CoRL 2018 (Spotlight)] [Project page] [Github]<br> Abhishek Das<sup>1</sup>,Georgia Gkioxari<sup>2</sup>, Stefan Lee<sup>1</sup>, Devi Parikh<sup>1,2</sup>, Dhruv Batra<sup>1,2</sup><br> <sup>1</sup>Georgia Institute of Technology <sup>2</sup>Facebook AI Research
-
Embodied Question Answering [CVPR 2018 (oral)] [Project page] [Github]<br> Abhishek Das<sup>1</sup>, Samyak Datta<sup>1</sup>, Georgia Gkioxari2<sup>2</sup>, Stefan Lee<sup>1</sup>, Devi Parikh<sup>2,1</sup>, Dhruv Batra<sup>2</sup> <br> <sup>1</sup>Georgia Institute of Technology, <sup>2</sup>Facebook AI Research
Rearrangement
- A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search [ICLR 2023] <br> <sup>1</sup>Brandon Trabucco, <sup>2</sup>Gunnar A Sigurdsson, <sup>2</sup>Robinson Piramuthu, <sup>2,3</sup>Gaurav S. Sukhatme, <sup>1</sup>Ruslan Salakhutdinov<br> <sup>1</sup>CMU, <sup>2</sup>Amazon Alexa AI, <sup>3</sup>University of Southern California
Benchmark
-
SmartPlay: A Benchmark for LLMs as Intelligent Agents [ICLR 2024] [Github] <br> Yue Wu<sup>1,2</sup>, Xuan Tang<sup>1</sup>, Tom Mitchell<sup>1</sup>, Yuanzhi Li<sup>1,2</sup> <sup>1</sup>Carnegie Mellon University, <sup>2</sup>Microsoft Research
-
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [arXiv 2023] [Project page] [Github] <br> Yufei Wang<sup>1</sup>, Zhou Xian<sup>1</sup>, Feng Chen<sup>2</sup>, Tsun-Hsuan Wang<sup>3</sup>, Yian Wang<sup>4</sup>, Katerina Fragkiadaki<sup>1</sup>, Zackory Erickson<sup>1</sup>, David Held<sup>1</sup>, Chuang Gan<sup>4,5</sup> <br> <sup>1</sup>CMU, <sup>2</sup>Tsinghua IIIS, <sup>3</sup>MIT CSAIL, <sup>4</sup>UMass Amherst, <sup>5</sup>MIT-IBM AI Lab
-
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning [ICLR 2021] [Project page] [Github] <br> Mohit Shridhar<sup>†</sup> Xingdi Yuan<sup>♡</sup> Marc-Alexandre Côté<sup>♡</sup> Yonatan Bisk<sup>‡</sup> Adam Trischler<sup>♡</sup> Matthew Hausknecht<sup>♣</sup><br> <sup>‡</sup>University of Washington <sup>♡</sup>Microsoft Research, Montréal <sup>‡</sup>Carnegie Mellon University <sup>♣</sup>Microsoft Research
-
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks [CVPR 2020] [Project page] [Github] <br> Mohit Shridhar<sup>1</sup> Jesse Thomason<sup>1</sup> Daniel Gordon<sup>1</sup> Yonatan Bisk<sup>1,2,3</sup> Winson Han<sup>3</sup> Roozbeh Mottaghi<sup>1,3</sup> Luke Zettlemoyer<sup>1</sup> Dieter Fox<sup>1,4</sup><br> <sup>1</sup>Paul G. Allen School of Computer Sci. & Eng., Univ. of Washington, <sup>2</sup>Language Technologies Institute @ Carnegie Mellon University, <sup>3</sup>Allen Institute for AI, <sup>4</sup>NVIDIA<br>
-
VIMA: Robot Manipulation with Multimodal Prompts [ICML 2023] [Project page] [Github] [VIMA-Bench] <br> Yunfan Jiang<sup>1</sup> Agrim Gupta<sup>1†</sup> Zichen Zhang<sup>2†</sup> Guanzhi Wang<sup>3,4†</sup> Yongqiang Dou<sup>5</sup> Yanjun Chen<sup>1</sup> Li Fei-Fei<sup>1</sup> Anima Anandkumar<sup>3,4</sup> Yuke Zhu<sup>3,6‡</sup> Linxi Fan<sup>3‡</sup><br>
-
SQA3D: Situated Question Answering in 3D Scenes [ICLR 2023] [Project page] [Slides] [Github]<br> Xiaojian Ma<sup>2</sup> Silong Yong<sup>1,3*</sup> Zilong Zheng<sup>1</sup> Qing Li<sup>1</sup> Yitao Liang<sup>1,4</sup> Song-Chun Zhu<sup>1,2,3,4</sup> Siyuan Huang<sup>1</sup><br> <sup>1</sup>Beijing Institute for General Artificial Intelligence (BIGAI) <sup>2</sup>UCLA <sup>3</sup>Tsinghua University <sup>4</sup>Peking University
-
IQA: Visual Question Answering in Interactive Environments [CVPR 2018] [Github] [Demo video (YouTube)]<br> Danie<sup>1</sup> Gordon1 Aniruddha Kembhavi<sup>2</sup> Mohammad Rastegari<sup>2,4</sup> Joseph Redmon<sup>1</sup> Dieter Fox<sup>1,3</sup> Ali Farhadi<sup>1,2</sup> <br> <sup>1</sup>Paul G. Allen School of Computer Science, University of Washington <sup>2</sup>Allen Institute for Artificial Intelligence <sup>3</sup>Nvidia <sup>4</sup>Xnor.ai
-
Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments [ICCV 2021] [Project page] [Github]<br> Difei Gao<sup>1,2</sup>, Ruiping Wang<sup>1,2,3</sup>, Ziyi Bai<sup>1,2</sup>, Xilin Chen<sup>1</sup>, <br> <sup>1</sup>Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, <sup>2</sup>University of Chinese Academy of Sciences, <sup>3</sup>Beijing Academy of Artificial Intelligence
Simulator
-
LEGENT: Open Platform for Embodied Agents [ACL 2024] [Project page] [Github]<br> Tsinghua University<br>
-
AI2-THOR: An Interactive 3D Environment for Visual AI [arXiv 2022] [Project page] [Github]<br> Allen Institute for AI, University of Washington, Stanford University, Carnegie Mellon University<br>
-
iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes [IROS 2021] [Project page] [Github]<br> Bokui Shen*, Fei Xia* et al.<br>
-
Habitat: A Platform for Embodied AI Research [ICCV 2019] [Project page] [Habitat-Sim] [Habitat-Lab] [Habitat Challenge]<br> Facebook AI Research, Facebook Reality Labs, Georgia Institute of Technology, Simon Fraser University, Intel Labs, UC Berkeley<br>
-
Habitat 2.0: Training Home Assistants to Rearrange their Habitat [NeurIPS 2021] [Project page]<br> Facebook AI Research, Georgia Tech, Intel Research, Simon Fraser University, UC Berkeley
Others
-
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models [ICLR 2023] <br> Google Research, Brain Team
-
React: Synergizing reasoning and acting in language models [ICLR 2023] <br> Shunyu Yao<sup>1∗</sup>, Jeffrey Zhao<sup>2</sup>, Dian Yu<sup>2</sup>, Nan Du<sup>2</sup>, Izhak Shafran<sup>2</sup>, Karthik Narasimhan<sup>1</sup>, Yuan Cao<sup>2</sup> <br> <sup>1</sup>Department of Computer Science, Princeton University <sup>2</sup>, Google Research, Brain team
-
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models [arXiv 2023] <br> Virginia Tech, Microsoft
-
Graph of Thoughts: Solving Elaborate Problems with Large Language Models [arXiv 2023] <br> ETH Zurich, Cledar, Warsaw University of Technology
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models [arXiv 2023] <br> Shunyu Yao<sup>1</sup>, Dian Yu<sup>2</sup>, Jeffrey Zhao<sup>2</sup>, Izhak Shafran<sup>2</sup>, Thomas L. Griffiths<sup>1</sup>, Yuan Cao<sup>2</sup>, Karthik Narasimhan<sup>1</sup> <br> <sup>1</sup>Princeton University, <sup>2</sup>Google DeepMind
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models [NeurIPS 2022] <br> Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, Denny Zhou<br> Google Research, Brain Team
-
MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge [NeurIPS 2022] [Github] [Project page] [Knowledge Base] <br> Linxi Fan<sup>1</sup> , Guanzhi Wang<sup>2∗</sup> , Yunfan Jiang<sup>3*</sup> , Ajay Mandlekar<sup>1</sup> , Yuncong Yang<sup>4</sup> , Haoyi Zhu<sup>5</sup> , Andrew Tang<sup>4</sup> , De-An Huang<sup>1</sup> , Yuke Zhu<sup>1,6†</sup> , Anima Anandkumar<sup>1,2†</sup><br> <sup>1</sup>NVIDIA, <sup>2</sup>Caltech, <sup>3</sup>Stanford, <sup>4</sup>Columbia, <sup>5</sup>SJTU, <sup>6</sup>UT Austin
-
Distilling Internet-Scale Vision-Language Models into Embodied Agents [ICML 2023] <br> Theodore Sumers<sup>1∗</sup> Kenneth Marino<sup>2</sup> Arun Ahuja<sup>2</sup> Rob Fergus<sup>2</sup> Ishita Dasgupta<sup>2</sup> <br>
-
LISA: Reasoning Segmentation via Large Language Model [arXiv 2023] [Github] [Huggingface Models] [Dataset] [Online Demo]
TXin Lai<sup>1</sup> Zhuotao Tian<sup>2</sup> Yukang Chen<sup>1</sup> Yanwei Li<sup>1</sup> Yuhui Yuan<sup>3</sup> Shu Liu<sup>2</sup> Jiaya Jia<sup>1,2</sup> <br> <sup>1</sup>The Chinese University of Hong Kong <sup>2</sup>SmartMore <sup>3</sup>MSRA<br>
Acknowledge
[1] Trend pic from this repo.<br> [2] Figure from this paper: The Rise and Potential of Large Language Model Based Agents: A Survey.