Home

Awesome

<!--Autonomous Agents --> <!-- Copyright (C) Teemu Maatta. @misc{MaattaAutonomousAgents2023, author = {Teemu Maatta}, title = {Autonomous Agents}, year = {2023}, howpublished = {\url{https://github.com/tmgthb/Autonomous-Agents}}, note = {Accessed: YYYY-MM-DD} } --> <div id="topofthepage"> </div> <div align="center">

Hits X GitHub Repo stars

</div> <p align="center"> <img height="100" src="https://github.com/tmgthb/Autonomous-Agents/blob/main/Autonomous_agent_logo.png" alt="Autonomous Agents"> </p> <div align="center">

Autonomous Agents

Autonomous Agents-research papers. Updated daily. See as well the Resources-section.

</div>
<div id="researchpapers" align="center">

Research papers

Chronological order.

</div>

18th of November 2024

GENERATIVE WORLD EXPLORER


OASIS: Open Agents SOCIAL INTERACTION Simulations on One Million Agents


TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World


A Code Knowledge Graph-Enhanced System for LLM-Based Fuzz Driver Generation


Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment


LLM-IE: A Python Package for Generative Information Extraction with Large Language Models


16th of November 2024

Developer Challenges on Large Language Models: A Study of Stack Overflow and OpenAI Developer Forum Posts


FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models


IntentGPT: Few-shot Intent Discovery with Large Language Models


15th of November 2024

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems


12th of November 2024

RedCode: Risky Code Execution and Generation Benchmark for Code Agents


World Models: The Safety Perspective


BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks


LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models


From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents


Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach


11th of November 2024

Mr.Steve: Instruction-Following Agents in Minecraft with What-Where-When Memory


Using Generative AI and Multi-Agents to Provide Automatic Feedback


Script-Strategy Aligned Generation: Aligning LLMs with Expert-Crafted Dialogue Scripts and Therapeutic Strategies for Psychotherapy


Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving


A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs


10th of November 2024

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents


9th of November 2024

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization


From References to Insights: Collaborative Knowledge Minigraph Agents for Automating Scholarly Literature Review


8th of November 2024

The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent


Game-theoretic LLM: Agent Workflow for Negotiation Games


7th of November 2024

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations


GUI Agents with Foundation Models: A Comprehensive Survey


CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models


CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation


6th of November 2024

AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making


MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue


From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning


5th of November 2024

SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction


AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution


1st of November 2024

DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems


31st of October 2024

Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks


30th of October 2024

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents


Aligning Audio-Visual Joint Representations with an Agentic Workflow


29th of October 2024

BENCHAGENTS: Automated Benchmark Creation with Agent Interaction


28th of October 2024

Asynchronous Tool Usage for Real-Time Agents

25th of October 2024

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models


VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs


Designing LLM-Agents with Personalities: A Psychometric Approach


FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning


AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs


EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data


Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models


AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios


24th of October 2024

Unbounded: A Generative Infinite Game of Character Life Simulation


AR: Operating System Control via State-Aware Reasoning and Re-Planning


PDL: A Declarative Prompt Programming Language


From a Tiny Slip to a Giant Leap: An LLM-Based Simulation for Fake News Evolution


PRACT: Optimizing Principled Reasoning and Acting of LLM Agent


23rd of October 2024

GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration


MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting


Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models


An Intelligent Agentic System for Complex Image Restoration Problems


ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents


Navigate Complex Physical Worlds via Geometrically Constrained LLM


21st of October 2024

Long Term Memory: The Foundation of AI Self-Evolution


Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers


20th of October 2024

Redefining Proactivity for Information Seeking Dialogue


18th of October 2024

Teaching Models to Balance Resisting and Accepting Persuasion


18th of October 2024

Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning


AI can help humans find common ground in democratic deliberation


17th of October 2024

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation


Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents


AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents


AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning


Harnessing Webpage UIs for Text-Rich Visual Understanding


Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems


A Comparative Study on Reasoning Patterns of OpenAI's o1 Model


MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling


Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning


Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence


RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents


16th of October 2024

Revealing the Barriers of Language Agents in Planning


Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models


Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios


JudgeBench: A Benchmark for Evaluating LLM-based Judges


SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling


Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving


Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning


MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration


Aegis:An Advanced LLM-Based Multi-Agent for Intelligent Functional Safety Engineering


15th of October 2024

G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks


AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data


Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs


14th of October 2024

AFlow: Automating Agentic Workflow Generation


10th of October 2024

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining


Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System


DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory


Mars: Situated Inductive Reasoning in an Open-World Environment


Benchmarking Agentic Workflow Generation


9th of October 2024

AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories


Smart Audit System Empowered by LLM


Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making


I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy


Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology


MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses


Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach


ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents


Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA


8th of October 2024

AgentSquare: Automatic LLM Agent Search in Modular Design Space


7th of October 2024

LLMs Are In-Context Reinforcement Learners


Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents


Grounding Partially-Defined Events in Multimodal Data


GLEE: A Unified Framework and Benchmark for Language-based Economic Environments


26th of September 2024

AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment


Control Industrial Automation System with Large Language Models


Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective


25th of September 2024

Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents


A Roadmap for Embodied and Social Grounding in LLMs


Plurals: A System for Guiding LLMs Via Simulated Social Ensembles


Language Grounded Multi-agent Communication for Ad-hoc Teamwork


24th of September 2024

MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents


23rd of September 2024

ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning


20th of September 2024

RRM: Robust Reward Model Training Mitigates Reward Hacking


ChainBuddy: An AI Agent System for Generating LLM Pipelines


Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts


ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources


19th of September 2024

Training Language Models to Self-Correct via Reinforcement Learning


AutoVerus: Automated Proof Generation for Rust Code


17th of September 2024

LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents


NVLM: Open Frontier-Class Multimodal LLMs


P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task


EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization


16th of September 2024

Instigating Cooperation among LLM Agents Using Adaptive Information Modulation


Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots


Central Answer Modeling for an Embodied Multi-LLM System


15th of September 2024

RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation


14th of September 2024

PeriGuru: A Peripheral Robotic Mobile App Operation Assistant based on GUI Image Understanding and Prompting with LLM


Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models


13th of September 2024

Agents in Software Engineering: Survey, Landscape, and Vision


12th of August 2024

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale


11th of September 2024

Agent Workflow Memory


10th of September 2024

Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System


9th of September 2024

SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning

8th of September 2024

Self-Reflection in LLM Agents: Effects on Problem-Solving Performance


5th of September 2024

Game On: Towards Language Models as RL Experimenters


From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents


xLAM: A Family of Large Action Models to Empower AI Agent Systems


4th of September 2024

Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments


Configurable Foundation Models: Building LLMs from a Modular Perspective


Large Language Model-Based Agents for Software Engineering: A Survey


MoA is All You Need: Building LLM Research Team using Mixture of Agents


3rd of September 2024

Empirical evidence of Large Language Model's influence on human spoken communication


AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction


Focus Agent: LLM-Powered Virtual Focus Group


2nd of September 2024

The Compressor-Retriever Architecture for Language Model OS


1st of September 2024

Self-evolving Agents with reflective and memory-augmented abilities


LanguaShrink: Reducing Token Overhead with Psycholinguistics


30th of August 2024

Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios


29th of August 2024

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling


CogVLM2: Visual Language Models for Image and Video Understanding


28th of August 2024

A Survey on Evaluation of Multimodal Large Language Models


WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration


AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems


Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions


BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems


Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games


FlowAct: A Proactive Multimodal Human-robot Interaction System with Continuous Flow of Perception and Modular Action Sub-systems


Retrieval-Augmented Instruction Tuning for Automated Process Engineering Calculations : A Tool-Chaining Problem-Solving Framework with Attributable Reflection


Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations


LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models


Persuasion Games using Large Language Models


EPO: Hierarchical LLM Agents with Environment Preference Optimization


27th of August 2024

27th of August 2024

Generative Verifiers: Reward Modeling as Next-Token Prediction


AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems


HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling


TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering


26th of August 2024

Foundation Models for Music: A Survey


AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework


SWE-bench-java: A GitHub Issue Resolving Benchmark for Java


23th of August 2024

LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction


Intelligent OPC Engineer Assistant for Semiconductor Manufacturing


22th of August 2024

MEDCO: Medical Education Copilots Based on A Multi-Agent Framework


Graph Retrieval Augmented Trustworthiness Reasoning


MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents


Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

--

SocialQuotes: Learning Contextual Roles of Social Media Quotes on the Web



Can LLMs Understand Social Norms in Autonomous Driving Games?


21st of August 2024

Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models


Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design


LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites


DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework


Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards


Cause-Aware Empathetic Response Generation via Chain-of-Thought Fine-Tuning


20th of August 2024

FLAME: Learning to Navigate with Multimodal LLM in Urban Environments


Athena: Safe Autonomous Agents with Verbal Contrastive Learning


Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search


MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding


19th of August 2024

MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems


GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making


18th of August 2024

Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval


HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model


16th of August 2024

EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics


15th of August 2024

Automated Design of Agentic Systems


13th of August 2024

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents



12th of August 2024

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery


9th of August 2024

Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement


8th of August 2024

Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate


RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents


7th of August 2024

Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions


Forecasting Live Chat Intent from Browsing History


CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases


6th of August 2024

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters


5th of August 2024

ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems


SpecRover: Code Intent Extraction via LLMs


LLM Agents Improve Semantic Code Search


3rd of August 2024

The Drama Machine: Simulating Character Development with LLM Agents


2nd of July 2024

Coalitions of Large Language Models Increase the Robustness of AI Agents


1st of August 2024

OmniParser for Pure Vision Based GUI Agent


AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation


31st of July 2024

Tulip Agent -- Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries


28th of July 2024

Solving Robotics Problems in Zero-Shot with Vision-Language Models


26th of July 2024

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents


25th of July 2024

PersonaGym: Evaluating Persona Agents and LLMs


Recursive Introspection: Teaching Language Model Agents How to Self-Improve


24th of July 2024

Reinforced Prompt Personalization for Recommendation with Large Language Models


AI-Gadget Kit: Integrating Swarm User Interfaces with LLM-driven Agents for Rich Tabletop Game Applications


3D Question Answering for City Scene Understanding


23rd of July 2024

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent


AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game


OpenDevin: An Open Platform for AI Software Developers as Generalist Agents


PyBench: Evaluating LLM Agent on various real-world coding tasks


Artificial Agency and Large Language Models

LawLuo: A Chinese Law Firm Co-run by LLM Agents


22th of July 2024

TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON


Odyssey: Empowering Agents with Open-World Skills


19th of July 2024

19th of July 2024

System-1.x: Learning to Balance Fast and Slow Planning with Language Models


The Vision of Autonomic Computing: Can LLMs Make It a Reality?


18th of July 2024

Prover-Verifier Games improve legibility of LLM outputs


12th of July 2024

PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents


Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments


Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation'


11th of July 2024

GTA: A Benchmark for General Tool Agents


Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence


Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents


GPT-4 is judged more human than humans in displaced and inverted Turing tests


Beyond Instruction Following: Evaluating Rule Following of Large Language Models


Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency


Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility


10th of July 2024

WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment


9th of July 2024

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models


Vision language models are blind


5th of July 2024

On scalable oversight with weak LLMs judging strong LLMs


When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions


Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games


3rd of July 2024

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control


Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory


2nd of July 2024

GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning


MMedAgent: Learning to Use Medical Tools with Multi-modal Agent


1st of July 2024

Agentless: Demystifying LLM-based Software Engineering Agents


28st of June 2024

LLM Critics Help Catch LLM Bugs


BMW Agents -- A Framework For Task Automation Through Multi-agent Collaboration


Scaling Synthetic Data Creation with 1,000,000,000 Personas


27st of June 2024

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?


Tools Fail: Detecting Silent Errors in Faulty Tools


Simulating Classroom Education with LLM-Empowered Agents


UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models


Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data


LayoutCopilot: An LLM-powered Multi-agent Collaborative Framework for Interactive Analog Layout Design


Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction


26th of June 2024

Symbolic Learning Enables Self-Evolving Agents


MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution


Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models


Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship


Mental Modeling of Reinforcement Learning Agents by Language Models

--

AI-native Memory: A Pathway from LLMs Towards AGI


Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation


LLCoach: Generating Robot Soccer Plans using Multi-Role Large Language Models


Octo-planner: On-device Language Model for Planner-Action Agents


25th of June 2024

Human-Object Interaction from Human-Level Instructions


24th of June 2024

RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale


RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale

21st of June 2024


GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians


ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models


Autonomous Agents for Collaborative Task under Information Asymmetry


FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents


Direct Multi-Turn Preference Optimization for Language Agents


Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework


DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection


Behaviour Distillation


Uni-Mol2: Exploring Molecular Pretraining Model at Scale


From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking


20th of June 2024

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning


GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models


LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors


Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory


EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms


Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics


Can LLMs Learn by Teaching? A Preliminary Study


MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate


19th of June 2024

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs


AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding


SpatialBot: Precise Spatial Understanding with Vision Language Models


LIT: Large Language Model Driven Intention Tracking for Proactive Human-Robot Collaboration -- A Robot Sous-Chef Application


18th of June 2024

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction


Problem-Solving in Language Model Networks


Ask-before-Plan: Proactive Language Agents for Real-World Planning


AgentReview: Exploring Peer Review Dynamics with LLM Agents


Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents


CodeNav: Beyond tool-use to using real-world codebases with LLM agents


P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts


MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL


Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics


The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions


VoCo-LLaMA: Towards Vision Compression with Large Language Models


17th of June 2024

MASAI: Modular Architecture for Software-engineering AI Agents


Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging


Input Conditioned Graph Generation for Language Agents


Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs


GUICourse: From General Vision Language Models to Versatile GUI Agents


CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents


Embodied Question Answering via Multi-LLM Systems


14th of June 2024

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning


VideoGUI: A Benchmark for GUI Automation from Instructional Videos


Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning


TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners


Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting


Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation


RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model


SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding


First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models


HIRO: Hierarchical Information Retrieval Optimization


DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning


4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities


13th of June 2024

StreamBench: Towards Benchmarking Continuous Improvement of Language Agents


Multi-Agent Software Development through Cross-Team Collaboration


RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs


When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search


Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis


12th of June 2024

MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents


A Dialogue Game for Eliciting Balanced Collaboration


Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey


Can Large Language Models Understand Spatial Audio?


11th of June 2024

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B


DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs


RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents


World Models with Hints of Large Language Models for Goal Achieving


DCA-Bench: A Benchmark for Dataset Curation Agents


A Synthetic Dataset for Personal Attribute Inference


Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees


10th of June 2024

FinVerse: An Autonomous Agent System for Versatile Financial Analysis


9th of June 2024

A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components


A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning


Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security


7th of June 2024

Mixture-of-Agents Enhances Large Language Model Capabilities


SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals


Language Guided Skill Discovery


6th of June 2024

Open-Endedness is Essential for Artificial Superhuman Intelligence


On the Effects of Data Scale on Computer Control Agents


Aligning Agents like Large Language Models


AgentGym: Evolving Large Language Model-based Agents across Diverse Environments


5th of June 2024

The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games


DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences


4th of June 2024

Chain of Agents: Large Language Models Collaborating on Long-Context Tasks


CoNav: A Benchmark for Human-Centered Collaborative Navigation


MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset


3rd of June 2024

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model


2nd of June 2024

A Survey of Useful LLM Evaluation


Teams of LLM Agents can Exploit Zero-Day Vulnerabilities


31st of May 2024

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales


LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models


30th of May 2024

Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization


Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory


Parrot: Efficient Serving of LLM-based Applications with Semantic Variable


Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions


From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems


Large Language Models Can Self-Improve At Web Agent Tasks


CausalQuest: Collecting Natural Causal Questions for AI Agents


Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf


29th of May 2024

Artificial Intelligence Index Report 2024


STAT: Shrinking Transformers After Training


Adaptive In-conversation Team Building for Language Model Agents


Contextual Position Encoding: Learning to Count What's Important


28th of May 2024

Faithful Logical Reasoning via Symbolic Chain-of-Thought


A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models


27th of May 2024

An Introduction to Vision-Language Modeling


24th of May 2024

Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent


9th of May 2024

Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning


Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents

Air Gap: Protecting Privacy-Conscious Conversational Agents


Truthful Aggregation of LLMs with an Application to Online Advertising


7th of May 2024

NeurDB: An AI-powered Autonomous Data System


Iterative Experience Refinement of Software-Developing Agents


Unveiling Disparities in Web Task Handling Between Human and Web Agent


Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation


Verified Neural Compressed Sensing


Iterative Experience Refinement of Software-Developing Agents


Policy Learning with a Language Bottleneck


6th of May 2024

Advancing Multimodal Medical Capabilities of Gemini


AlphaMath Almost Zero: process Supervision without process


Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity


Enhancing Q-Learning with Large Language Model Heuristics


Large Language Models (LLMs) as Agents for Augmented Democracy


Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer


Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions


Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration


Self-Improving Customer Review Response Generation Based on LLMs


Select to Perfect: Imitating desired behavior from large multi-agent data


When LLMs Meet Cybersecurity: A Systematic Literature Review


FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering


Language-Image Models with 3D Understanding


Thoughtful Things: Building Human-Centric Smart Devices with Small Language Models


Organizing a Society of Language Models: Structures and Mechanisms for Enhanced Collective Intelligence


Towards a Formal Creativity Theory: Preliminary results in Novelty and Transformativeness


OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs


5th of May 2024

Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents


Graphical user interface agents optimization for visual instruction grounding using multi-modal artificial intelligence systems


AppAgent v2: Advanced Agent for Flexible Mobile Interactions


Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation


Visual grounding for desktop graphical user interfaces


3th o May 2024

Automating the Enterprise with Foundation Models


Neuromorphic Correlates of Artificial Consciousness


What matters when building vision-language models?


CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation


Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning


Comparative Analysis of Retrieval Systems in the Real World


2nd of May 2024

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks


FLAME: Factuality-Aware Alignment for Large Language Models


Generative Active Learning for the Search of Small-molecule Protein Binders


Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts


OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning


CACTUS: Chemistry Agent Connecting Tool-Usage to Science


Creative Problem Solving in Large Language and Vision Models -- What Would it Take?


CoS: Enhancing Personalization and Mitigating Bias with Context Steering


Generative Active Learning for the Search of Small-molecule Protein Binders


1st of May 2024

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning


ULLER: A Unified Language for Learning and Reasoning


GOLD: Geometry Problem Solver with Natural Language Description


Social Life Simulation for Non-Cognitive Skills Learning


Can a Hallucinating Model help in Reducing Human "Hallucination"?


"Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time


Characterising the Creative Process in Humans and Large Language Models


29th of April 2024

Capabilities of gemini models in medicine


Reinforcement Learning Problem Solving with Large Language Models


HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models


28th of April 2024

From Persona to Personalization: A Survey on Role-Playing Language Agents


Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant


26th of April 2024

Unveiling Thoughts: A Review of Advancements in EEG Brain Signal Decoding into Text


24th of April 2024

Retrieval Head Mechanistically Explains Long-Context Factuality


23th of April 2024

Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering


Rethinking LLM Memorization through the Lens of Adversarial Compression


Evaluating Tool-Augmented Agents in Remote Sensing Platforms


22th of April 2024

A Survey on Self-Evolution of Large Language Models


21st of April 2024

A Survey on the Memory Mechanism of Large Language Model based Agents


Accelerating Medical Knowledge Discovery through Automated Knowledge Graph Generation and Enrichment


19th of April 2024

AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation


[Let's Think Dot by Dot: Hidden Computation in Transformer Language Models{(https://arxiv.org/abs/2404.15758)


SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models


18th of April 2024

Aligning Language Models to Explicitly Handle Ambiguity


mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture


17th of April 2024

Many-Shot In-Context Learning


The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey


AgentKit: Flow Engineering with Graphs, not Coding


Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent


Open-Ended Wargames with Large Language Models


16th of April 2024

Self-playing Adversarial Language Game Enhances LLM Reasoning


Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V


Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards


VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time


SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation


N-Agent Ad Hoc Teamwork


Emergent intelligence of buckling-driven elasto-active structures


HLAT: High-quality Large Language Model Pre-trained on AWS Trainium


Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases


White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency


Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models


Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration


What is Meant by AGI? On the Definition of Artificial General Intelligence


Private Attribute Inference from Images with Vision-Language Models


CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity


Chinchilla Scaling: A replication attempt


TEL'M: Test and Evaluation of Language Models


Deceiving to Enlighten: Coaxing LLMs to Self-Reflection for Enhanced Bias Detection and Mitigation


Model-based Offline Quantum Reinforcement Learning


AIGeN: An Adversarial Approach for Instruction Generation in VLN


Language Model Cascades: Token-level uncertainty and beyond


EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning


How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior


Rethinking Software Engineering in the Foundation Model Era: From Task-Driven AI Copilots to Goal-Driven AI Pair Programmers


Vision-and-Language Navigation via Causal Learning


Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy


HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights


Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms


Social Choice for AI Alignment: Dealing with Diverse Human Feedback


Engineering software 2.0 by interpolating neural networks: unifying training, solving, and calibration


Future Language Modeling from Temporal Document History


Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs


Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning


Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering


SparseDM: Toward Sparse Efficient Diffusion Models


Advancing Long-Term Multi-Energy Load Forecasting with Patchformer: A Patch and Transformer-Based Approach


DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion


When Emotional Stimuli meet Prompt Designing: An Auto-Prompt Graphical Paradigm


Self-Supervised Visual Preference Alignment


White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency


Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning


Generative Text Steganography with Large Language Model


EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence


Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay


Question Difficulty Ranking for Multiple-Choice Reading Comprehension


Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units


MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents


LegalPro-BERT: Classification of Legal Provisions by fine-tuning BERT Large Language Model


Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study


Automating REST API Postman Test Cases Using LLM


Spiral of Silences: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering


MEEL: Multi-Modal Event Evolution Learning


Find The Gap: Knowledge Base Reasoning For Visual Question Answering


15th of April 2024

Memory Sharing for Large Language Model based Agents


Reimagining Self-Adaptation in the Age of Large Language Models


Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR


ChatShop: Interactive Information Seeking with Language Agents


TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition


LLMorpheus: Mutation Testing using Large Language Models


A Survey on Deep Learning for Theorem Proving


Progressive Knowledge Graph Completion


Synergising Human-like Responses and Machine Intelligence for Planning in Disaster Response


HyperMono: A Monotonicity-aware Approach to Hyper-Relational Knowledge Representation


Action Model Learning with Guarantees


Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda


MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion


Monte Carlo Search Algorithms Discovering Monte Carlo Tree Search Exploration Terms


Assessing Economic Viability: A Comparative Analysis of Total Cost of Ownership for Domain-Adapted Large Language Models versus State-of-the-art Counterparts in Chip Design Coding Assistance


Handling Reward Misspecification in the Presence of Expectation Mismatch


Generating Games via LLMs: An Investigation with Video Game Description Language


MMInA: Benchmarking Multihop Multimodal Internet Agents


Evolving Interpretable Visual Classifiers with Large Language Models


Evolving Interpretable Visual Classifiers with Large Language Models


Compression Represents Intelligence Linearly


Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection


Foundational Challenges in Assuring Alignment and Safety of Large Language Models


Is Table Retrieval a Solved Problem? Join-Aware Multi-Table Retrieval


Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL


Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video


KG-CTG: Citation Generation through Knowledge Graph-guided Large Language Models


Effective Reinforcement Learning Based on Structural Information Principles


Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model


Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning


Are Large Language Models Reliable Argument Quality Annotators?


LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models


Harnessing GPT-4V(ision) for Insurance: A Preliminary Exploration


Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation


All-in-one simulation-based inference


Efficient and accurate neural field reconstruction using resistive memory


A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions


Building Semantic Communication System via Molecules: An End-to-End Training Approach


σ-GPTs: A New Approach to Autoregressive Models


Characterization and Mitigation of Insufficiencies in Automated Driving Systems


Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning


State Space Model for New-Generation Network Alternative to Transformers: A Survey


PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI


Exploring Text-to-Motion Generation with Human Preference


The 8th AI City Challenge


RankCLIP: Ranking-Consistent Language-Image Pretraining


Tasks People Prompt: A Taxonomy of LLM Downstream Tasks in Software Verification and Falsification Approaches


14th of April 2024

Self-Selected Attention Span for Accelerating Large Language Model Inference


TransformerFAM: Feedback attention is working memory


Interactive Generative AI Agents for Satellite Networks through a Mixture of Experts Transmission


Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation


LLeMpower: Understanding Disparities in the Control and Access of Large Language Models


Towards Practical Tool Usage for Continually Learning LLMs


SNN4Agents: A Framework for Developing Energy-Efficient Embodied Spiking Neural Networks for Autonomous Agents


Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment


TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning


Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection


Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts


Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts


TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models


Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling


Survey on Embedding Models for Knowledge Graph and its Applications


GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning


Fusion-Mamba for Cross-modality Object Detection


ToNER: Type-oriented Named Entity Recognition with Generative Language Model


Provable Interactive Learning with Hindsight Instruction Feedback


Semantic In-Domain Product Identification for Search Queries


13th of April 2024

LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration


When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models

"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations


Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation


Generative AI Agent for Next-Generation MIMO Design: Fundamentals, Challenges, and Vision


CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting


CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants


Exploring Explainability in Video Action Recognition


Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model


Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies


Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households


Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning


Understanding Multimodal Deep Neural Networks: A Concept Selection View


EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM


An evaluation framework for synthetic data generation models


On Speculative Decoding for Multimodal Large Language Models

12th of April 2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length


Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension


Aligning LLMs for FL-free Program Repair


LLM In-Context Recall is Prompt Dependent


CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models


Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery


Augmenting Knowledge Graph Hierarchies Using Neural Transformers


Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation


LLM Agents can Autonomously Exploit One-day Vulnerabilities


Memory Traces: Are Transformers Tulving Machines?


Study of Emotion Concept Formation by Integrating Vision, Physiology, and Word Information using Multilayered Multimodal Latent Dirichlet Allocation


Inverse Kinematics for Neuro-Robotic Grasping with Humanoid Embodied Agents


SQBC: Active Learning using LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions


Training a Vision Language Model as Smartphone Assistant


Apollonion: Profile-centric Dialog Agent


Strategic Interactions between Large Language Models-based Agents in Beauty Contests


Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation


Toward a Theory of Tokenization in LLMs


Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions


11th of April 2024

Rho-1: Not All Tokens Are What You Need


Large Language Model Can Continue Evolving From Mistakes


Auctions with LLM Summaries


OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments


ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs


DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation


Monte Carlo Tree Search with Boltzmann Exploration


WESE: Weak Exploration to Strong Exploitation for LLM Agents


Behavior Trees Enable Structured Programming of Language Model Agents


LLoCO: Learning Long Contexts Offline


ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past


10th of April 2024

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

--

Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy


Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation


Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation


Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention


GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications


Vision-Language Model-based Physical Reasoning for Robot Liquid Perception


BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks


9th of April 2024

Measuring the Persuasiveness of Language Models


Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?


Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems


AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents


AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning


Autonomous Evaluation and Refinement of Digital Agents


Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry


Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective


Text-Based Reasoning About Vector Graphics


Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs


pfl-research: simulation framework for accelerating research in Private Federated Learning


MuPT: A Generative Symbolic Music Pretrained Transformer


VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs


WESE: Weak Exploration to Strong Exploitation for LLM Agents


ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos


Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models


Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning


THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?


8th of April 2024

HAMMR: HierArchical MultiModal React agents for generic VQA


Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs


AutoCodeRover: Autonomous Program Improvement


Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws


360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System


Finding Visual Task Vectors


LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models


LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding


WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents


Attention-Driven Multi-Agent Reinforcement Learning: Enhancing Decisions with Expertise-Informed Tasks


Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models


Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models


[Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics](Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics)


7th of April 2024

AI2Apps: A Visual IDE for Building LLM-based AI Agent Applications


LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead


StockGPT: A GenAI Model for Stock Prediction and Trading

Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs


6th of April 2024

Self-organizing Multiagent Target Enclosing under Limited Information and Safety Guarantees


Challenges Faced by Large Language Models in Solving Multi-Agent Flocking


Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning


Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology


Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model


The Case for Developing a Foundation Model for Planning-like Tasks from Scratch


MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems


Goal-guided Generative Prompt Injection Attack on Large Language Models


5th of April 2024

Exploring Autonomous Agents through the Lens of Large Language Models: A Review


Increased LLM Vulnerabilities from Fine-tuning and Quantization


Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents


ROMA-iQSS: An Objective Alignment Approach via State-Based Value Learning and ROund-Robin Multi-Agent Scheduling


Hypothesis Generation with Large Language Models


KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion


4th of April 2024

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent


Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

Language Model Evolution: An Iterated Learning Perspective


Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration


CONFLARE: CONFormal LArge language model REtrieval


SELF-[IN]CORRECT: LLMs Struggle with Refining Self-Generated Responses


Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding


Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences


Comprehensible Artificial Intelligence on Knowledge Graphs: A survey


Benchmarking ChatGPT on Algorithmic Reasoning


Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra


ReFT: Representation Finetuning for Language Models


CodeEditorBench: Evaluating Code Editing Capability of Large Language Models


A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation


Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought


Embodied Neuromorphic Artificial Intelligence for Robotics: Perspectives, Challenges, and Research Development Stack


RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis


3rd of April 2024

MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise


I-Design: Personalized LLM Interior Designer

On the Importance of Uncertainty in Decision-Making with Large Language Models

Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game

Designing for Human-Agent Alignment: Understanding what humans want from their agents


PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models


Testing the Effect of Code Documentation on Large Language Model Code Understanding


The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers


Measuring Social Norms of Large Language Models


Exploring Backdoor Vulnerabilities of Chat Models


2th of April 2024

[Mixture-of-Depths: Dynamically allocating compute in transformer-based language models](Mixture-of-Depths: Dynamically allocating compute in transformer-based language models)


A Survey on Large Language Model-Based Game Agents


Advancing LLM Reasoning Generalists with Preference Trees


Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization


Large Language Models for Orchestrating Bimanual Robots


CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models


InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis


Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game


Collapse of Self-trained Language Models


RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction


Is Exploration All You Need? Effective Exploration Characteristics for Transfer in Reinforcement Learning


1st of April 2024

Stream of Search (SoS): Learning to Search in Language


LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models


Large Language Model Evaluation Via Multi AI Agents: Preliminary results




31st of March 2024


CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs


DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Algorithmic Collusion by Large Language Models


"My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents




30th of March 2024

Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning


Language Models are Spacecraft Operators


A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration


Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods


Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World


29th of March 2024

Gecko: Versatile Text Embeddings Distilled from Large Language Models


ITCMA: A Generative Agent Based on a Computational Consciousness Structure


Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning


28th of March 2024

STaR-GATE: Teaching Language Models to Ask Clarifying Questions


MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation


Change-Agent: Towards Interactive Comprehensive Change Interpretation and Analysis from Change Detection and Change Captioning


Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning


Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis


LLMs as Academic Reading Companions: Extending HCI Through Synthetic Personae


MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation





27th of March 2024

Long-form factuality in large language models


What are human values, and how do we align AI to them?


Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation


A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks


A Study of Three Influencer Archetypes for the Control of Opinion Spread in Time-Varying Social Networks



26th of March 2024

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution


Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games


Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation

A Study of Three Influencer Archetypes for the Control of Opinion Spread in Time-Varying Social Networks


Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation


Compressed Federated Reinforcement Learning with a Generative Model



25th of March 2024

AIOS: LLM Agent Operating System


RepairAgent: An Autonomous, LLM-Based Agent for Program Repair


CYGENT: A cybersecurity conversational agent with log summarization powered by GPT-3


TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models


Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks

Do LLM Agents Have Regret? A Case Study in Online Learning and Games


An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems


Harnessing the power of LLMs for normative reasoning in MASs


Norm Violation Detection in Multi-Agent Systems using Large Language Models: A Pilot Study


Towards Automatic Evaluation for LLMs' Clinical Capabilities: Metric, Data, and Algorithm


Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation


RL for Consistency Models: Faster Reward Guided Text-to-Image Generation



24th of March 2024


AgentFL: Scaling LLM-based Fault Localization to Project-Level Context

Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications







23th of March 2024

When LLM-based Code Generation Meets the Software Development Process


Towards a RAG-based Summarization Agent for the Electron-Ion Collider


EduAgent: Generative Student Agents in Learning




22th of March 2024

Can large language models explore in-context?


CoLLEGe: Concept Embedding Generation for Large Language Models


LLM-Driven Agents for Influencer Selection in Digital Advertising Campaigns


Language Models in Dialogue: Conversational Maxims for Human-AI Interactions


CACA Agent: Capability Collaboration based AI Agent


Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)


21st of March 2024

ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training


ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification


PeerGPT: Probing the Roles of LLM-based Peer Agents as Team Moderators and Participants in Children's Collaborative Learning


RoleInteract: Evaluating the Social Interaction of Role-Playing Agents


Polaris: A Safety-focused LLM Constellation Architecture for Healthcare


20th of March 2024

Reverse Training to Nurse the Reversal Curse


Large Language Models meet Network Slicing Management and Orchestration


Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk Assessment Proposal


19th of March 2024

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models


HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning


Characteristic AI Agents via Large Language Models


Embodied LLM Agents Learn to Cooperate in Organized Teams


Contextual Moral Value Alignment Through Context-Based Aggregation


LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction


The Use of Generative Search Engines for Knowledge Work and Complex Tasks


18th of March 2024

Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models


EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents


From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models


Agent3D-Zero: An Agent for Zero-shot 3D Understanding


17th of March 2024

Logic Query of Thoughts: Guiding Large Language Models to Answer Complex Logic Queries with Knowledge Graphs


15th of March 2024

DiPaCo: Distributed Path Composition


PERL: Parameter Efficient Reinforcement Learning from Human Feedback


AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation


Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning


Can a GPT4-Powered AI Agent Be a Good Enough Performance Attribution Analyst?


ChatPattern: Layout Pattern Customization via Natural Language


ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference


14th of March 2024

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking


Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models


VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding


From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News


LLM-based agents for automating the enhancement of user story quality: An early report


USimAgent: Large Language Models for Simulating Search Users


MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training


13th of March 2024

Gemma: Open Models Based on Gemini Research and Technology


Scaling Instructable Agents Across Many Simulated Worlds


SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents


AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents


TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation


System for systematic literature review using multiple AI agents: Concept and an empirical evaluation


Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation


Cultural evolution in populations of Large Language Models


CleanAgent: Automating Data Standardization with LLM-based Agents


12th of March 2024

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning


WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?


Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations


DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation


AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production


11th of March 2024

RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems


DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation


Academically intelligent LLMs are not necessarily socially intelligent


10th of March 2024

TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision


ArgMed-Agents: Explainable Clinical Decision Reasoning with Large Language Models via Argumentation Schemes


Reframe Anything: LLM Agent for Open World Video Reframing


9th of March 2024

Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks


8th of March 2024

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation


FLAP: Flow Adhering Planning with Constrained Decoding in LLMs


Will GPT-4 Run DOOM?


7th of March 2024

Acceleron: A Tool to Accelerate Research Ideation


6th of March 2024

PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion


SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models


Exploring LLM-based Agents for Root Cause Analysis


5th of March 2024

Cradle: Empowering Foundation Agents Towards General Computer Control


Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination


Language Guided Exploration for RL Agents in Text Environments


KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents


Learning to Use Tools via Cooperative and Interactive Agents


OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following


Android in the Zoo: Chain-of-Action-Thought for GUI Agents


InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents


Entropy-Regularized Token-Level Policy Optimization for Large Language Models


ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary


4th of March 2024

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents


2nd of March 2024

AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks


SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code


1st of March 2024

Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents


28th of February 2024

Human Simulacra: A Step toward the Personification of Large Language Models


Prospect Personalized Recommendation on Large Language Model-based Agent Platform


Data Interpreter: An LLM Agent For Data Science


24th of February 2024

ByteComposer: a Human-like Melody Composition Method based on Language Model Agent


23th of February 2024

Large Multimodal Agents: A Survey


Genie: Generative Interactive Environments


21st of February 2024

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping


User-LLM: Efficient LLM Contextualization with User Embeddings


∞Bench: Extending Long Context Evaluation Beyond 100K Tokens


Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent


20th of February 2024

MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion


Large Language Model-based Human-Agent Collaboration for Complex Task Solving


19th of February 2024

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling


Shall We Talk: Exploring Spontaneous Collaborations of Competing LLM Agents


WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment


Comprehensive Cognitive LLM Agent for Smartphone GUI Automation


LLM Agents for Psychology: A Study on Gamified Assessments


Structured Chain-of-Thought Prompting for Few-Shot Generation of Content-Grounded QA Conversations


18th of February 2024

LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration


Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents


Modelling Political Coalition Negotiations Using LLM-based Agents


17th of February 2024

LLM can Achieve Self-Regulation via Hyperparameter Aware Generation


16th of February 2024

Robust agents learn causal world models


15th of February 2024

Chain-of-Thought Reasoning Without Prompting


A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts


AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator


14th of February 2024

AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous Systems


Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications


DoRA: Weight-Decomposed Low-Rank Adaptation


13th of February 2024

GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements


Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback


Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast


Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs


Large Language Models as Minecraft Agents


PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment


12th of February 2024

T-RAG: Lessons from the LLM Trenches


OS-Copilot: Towards Generalist Computer Agents with Self-Improvement


Predictive representations: building blocks of intelligence


Secret Collusion Among Generative AI Agents


THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation


11th of February 2024

Self-Correcting Self-Consuming Loops for Generative Model Training


9th of February 2024

<div id="vstar"> </div>

V-STaR: Training Verifiers for Self-Taught Reasoners


Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training


Feedback Loops With Language Models Drive In-Context Reward Hacking


Understanding the Weakness of Large Language Model Agents within a Complex Android Environment


<div id="llmsurveymikolov"> </div>

Large Language Models: A Survey


Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet


8th of February 2024

<div id="interactiveagent"> </div>

An Interactive Agent Foundation Model


UFO: A UI-Focused Agent for Windows OS Interaction


Real-World Robot Applications of Foundation Models: A Review


TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation


ScreenAgent: A Vision Language Model-driven Computer Control Agent


In-Context Principle Learning from Mistakes


Keyframer: Empowering Animation Design using Large Language Models


Discovering Temporally-Aware Reinforcement Learning Algorithms


WebLINX: Real-World Website Navigation with Multi-Turn Dialogue


How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis


Decision Theory-Guided Deep Reinforcement Learning for Fast Learning


7th of February 2024

The Future of Cognitive Strategy-enhanced Persuasive Dialogue Agents: New Perspectives and Trends


Can Large Language Model Agents Simulate Human Trust Behaviors?


ScreenAI: A Vision-Language Model for UI and Infographics Understanding


6th of February 2024

Self-Discover: Large Language Models Self-Compose Reasoning Structures


AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls


Can Generative Agents Predict Emotion?


S-Agents: self-organizing agents in open-ended environment


Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning


MobileVLM V2: Faster and Stronger Baseline for Vision Language Model


QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model


Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models


In-context learning agents are asymmetric belief updaters


Systematic Biases in LLM Simulations of Debates


Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science


5th of February 2024

Chain-of-Feedback: Mitigating the Effects of Inconsistency in Responses


Vision-Language Models Provide Promptable Representations for Reinforcement Learning


Guiding Language Model Math Reasoning with Planning Tokens


DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models


LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models


Graph-enhanced Large Language Models in Asynchronous Plan Reasoning


C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models


4th of February 2024

Understanding the planning of LLM agents: A survey


Solution-oriented Agent-based Models Generation with Verifier-assisted Iterative In-context Learning


LLM-Enhanced Data Management


Collaborative Agents for Software Engineering


3rd of Februry 2024

More Agents Is All You Need


Affordable Generative Agents


2nd of February 2024

K-Level Reasoning with Large Language Models


1st of February 2024

Multimodal Embodied Interactive Agent for Cafe Scene


Efficient Exploration for LLMs


Hello OLMo: A truly open LLM


Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents


30th of January 2024

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis


Efficient Tool Use with Chain-of-Abstraction Reasoning


Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios


Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate


LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation


29th of January 2024

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception


Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis


Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation


28th of January 2024

YODA: Teacher-Student Progressive Learning for Language Models


26th of January 2024

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion


24th of January 2024

Hi-Core: Hierarchical Knowledge Transfer for Continual Reinforcement Learning


23rd of January 2024

Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding


AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents


HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments


22th of January 2024

Memory Matters: The Need to Improve Long-Term Memory in LLM-Agents


OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics


WARM: On the Benefits of Weight Averaged Reward Models


PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety


21st of January 2024

AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology


The Conversation is the Command: Interacting with Real-World Autonomous Robot Through Natural Language


19th of January 2024

Tool-LMM: A Large Multi-Modal Model for Tool Agent Learning


A match made in consistency heaven: when large language models meet evolutionary algorithms


CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents


18th of January 2024

Self-Rewarding Language Models


R-Judge: Benchmarking Safety Risk Awareness for LLM Agents


17th of January 2024

Large Language Models Are Neurosymbolic Reasoners


ReFT: Reasoning with Reinforced Fine-Tuning


Scalable Pre-training of Large Autoregressive Image Models


What makes for a 'good' social actor? Using respect as a lens to evaluate interactions with language agents


16th of January 2024

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering


MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World


DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models


Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination


Application of LLM Agents in Recruitment: A Novel Framework for Resume Screening


Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation


15th of January 2024

Exploring the Potential of Large Language Models in Self-adaptive Systems


A Study on Training and Developing Large Language Models for Behavior Tree Generation


When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment


14th of January 2024

CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges


Small LLMs Are Weak Tool Learners: A Multi-LLM Agent


12th of January 2024

ModaVerse: Efficiently Transforming Modalities with LLMs


AntEval: Quantitatively Evaluating Informativeness and Expressiveness of Agent Social Interactions


Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study


11th of January 2024

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction


Designing Heterogeneous LLM Agents for Financial Sentiment Analysis


Evidence to Generate (E2G): A Single-agent Two-step Prompting for Context Grounded and Retrieval Augmented Reasoning


10th of January 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training


Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security


The Impact of Reasoning Step Length on Large Language Models


InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks


9th of January 2024

Agent Alignment in Evolving Social Norms


Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects


Metacognition is all you need? Using Introspection in Generative Agents to Improve Goal-directed Behavior


<div id="agentbasedai"> </div>

7th of January 2024

Agent AI: Surveying the Horizons of Multimodal Interaction


4th of January 2024

LLaVA-ϕ: Efficient Multi-Modal Assistant with Small Language Model


Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives


INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning


3rd of January 2024

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes


Economics Arena for Large Language Models


2nd of January 2024

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning


<div id="spin"> </div>

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

22th of December 2023

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning


21st of December 2023

AppAgent: Multimodal Agents as Smartphone Users


Capture the Flag: Uncovering Data Insights with Large Language Models


20th of December 2023

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation


DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines


ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation


Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions


dIR -- Discrete Information Retrieval: Conversational Search over Unstructured (and Structured) Data with Large Language Models


19th of December 2023

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach


<div id="humancap"> </div>

Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives


18th of December 2023

Agent Assessment of Others Through the Lens of Self


Evaluating Language-Model Agents on Realistic Autonomous Tasks


LLM-ARK: Knowledge Graph Reasoning Using Large Language Models via Deep Reinforcement Learning


17th of December 2023

Learning to Act without Actions


16th of December 2023

ProTIP: Progressive Tool Retrieval Improves Planning


<div id="restreact"> </div>

15th of December 2023

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent


<div id="agenticaisystem"> </div>

14th od December 2023

Practices for Governing Agentic AI Systems


TinyGSM: achieving >80% on GSM8k with small language models


Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent


Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory


LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers


LLMind: Orchestrating AI and IoT with LLMs for Complex Task Execution


Holodeck: Language Guided Generation of 3D Embodied AI Environments


Personalized Path Recourse


Adaptive parameter sharing for multi-agent reinforcement learning


Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft


Vision-Language Models as a Source of Rewards


Learning Coalition Structures with Games


12th of December 2023

Medprompt+


diff History for Long-Context Language Agents


Sequential Planning in Large Partially Observable Environments guided by LLMs


11th of December 2023

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models


8th of Decembebr 2023

KwaiAgents: Generalized Information-seeking Agent System with Large Language Models


7th of December 2023

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator


AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making


Generating Illustrated Instructions


Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use


6th of December 2023

Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia


LLM as OS (llmao), Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem


5th of December 2023

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models


Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Constructio


Large Knowledge Model: Perspectives and Challenges


4th of December 2023

Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication


LLM A*: Human in the Loop Large Language Models Enabled A* Search for Robotics


Towards Learning a Generalist Model for Embodied Navigation


OpenVoice: Versatile Instant Voice Cloning


29th of Novemebr 2023

Universal Self-Consistency for Large Language Model Generation


28th of Novemebr 2023

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine


27th of Novemeber 2023

<div id="extreme"></div>

Some intuitions about large language models


22th of November 2023

Building the Future of Responsible AI: A Pattern-Oriented Reference Architecture for Designing Large Language Model based Agents


21st of November 2023

System 2 Attention (is something you might need too)


20th of November 2023

Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents


17th of November 2023

A Language Agent for Autonomous Driving


16th of November 2023

Digital Socrates: Evaluating LLMs through explanation critiques


15th of November 2023

Divergences between Language Models and Human Brains


AutoMix: Automatically Mixing Language Models


14th of November 2023

DeepThought: An Architecture for Autonomous Self-motivated Systems


9th of November 2023

LLM Augmented Hierarchical Agents


Prompt Engineering a Prompt Engineer


8th of November 2023

ADaPT: As-Needed Decomposition and Planning with Language Models


2nd of November 2023

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation


<div id="stopvideo"></div>

Youtube. Adam Kalai presents "Recursive Self-improving Code Generation - talk 2.11.2023


1st of November 2023

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents


SAGE: Smart home Agent with Grounded Execution


Efficient Human-AI Coordination via Preparatory Language-based Convention


31st of October 2023

Generating Sequences by Learning to Self-Correct


Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback


Towards A Natural Language Interface for Flexible Multi-Agent Task Assignment


Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models


Multi-Agent Consensus Seeking via Large Language Models


26th of October 2023

CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents


25th of October 2023

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization


24th of October 2023

RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models


Diverse Conventions for Human-AI Collaboration


Woodpecker: Hallucination Correction for Multimodal Large Language Models


In-Context Learning Creates Task Vectors


Instruct and Extract: Instruction Tuning for On-Demand Information Extraction


23th of October 2023

Function Vectors in Large Language Models


LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay


20th of October 2023

<div id="toolchain"></div>

ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search


Democratizing Reasoning Ability: Tailored Learning from Large Language Model


19th of October 2023

AgentTuning: Enabling Generalized Agent Abilities for LLMs


18th of October 2023

Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale


17th of October 2023

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V


VeRA: Vector-based Random Matrix Adaptation


The next grand challenge for AI


16th of October 2023

Character-LLM: A Trainable Agent for Role-Playing


OpenAgents: An Open Platform for Language Agents in the Wild


Improving Large Language Model Fine-tuning for Solving Math Problems


CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization


Theory of Mind for Multi-Agent Collaboration via Large Language Models


13th of October 2023

A Zero-Shot Language Agent for Computer Control with Structured Reflection


12th of October 2023

AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems


Octopus: Embodied Vision-Language Programmer from Environmental Feedback


MemGPT: Towards LLMs as Operating Systems


Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques


Towards Robust Multi-Modal Reasoning via Model Selection


11th of October 2023

The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models


Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting


LangNav: Language as a Perceptual Representation for Navigation


10th of October 2023

Towards Mitigating Hallucination in Large Language Models via Self-Reflection


9th of October 2023

FireAct: Toward Language Agent Fine-tuning


8th of October 2023

Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading


7th of October 2023

Crystal: Introspective Reasoners Reinforced with Self-Feedback


Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games


6th of October 2023

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models


BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity


5th of October 2023

Agent Instructs Large Language Models to be General Zero-Shot Reasoners


5th of October 2023

Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures


DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines


3rd of October 2023

<div id="stop"></div>

Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation


Lyfe Agents: Generative agents for low-cost real-time social interactions


EcoAssistant: Using LLM Assistant More Affordably and Accurately


Large Language Models as Analogical Reasoners


Conceptual Framework for Autonomous Cognitive Entities


OceanGPT: A Large Language Model for Ocean Science Tasks


2nd of October 2023

Enabling Language Models to Implicitly Learn Self-Improvement


SmartPlay : A Benchmark for LLMs as Intelligent Agents


GRID: A Platform for General Robot Intelligence Development


1st of October 2023

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models


29th of September 2023

AutoAgents: A Framework for Automatic Agent Generation


Motif: Intrinsic Motivation from Artificial Intelligence Feedback


28th of September 2023

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution


24th of September 2023

Let's reward step by step: Step-Level reward model as the Navigators for Reasoning


23th of September 2023

Natural Language based Context Modeling and Reasoning with LLMs: A Tutorial


20th of September 2023

You only look at the screens: Multimodal Chain-of-Action Agents


18th of September 2023

MindAgent: Emergent Gaming Interaction


14th of September 2023

<div id="llmagentsurvey"> </div>

The Rise and Potential of Large Language Model Based Agents: A Survey


Agents: An Open-source Framework for Autonomous Language Agents


<div id="physicalgrounding"> </div>

13th of September 2023

Physically Grounded Vision-Language Models for Robotic Manipulation


12th of September 2023

Life-inspired Interoceptive Artificial Intelligence for Autonomous and Adaptive Agents


Textbooks Are All You Need


8th of September 2023

<div id="autonomousagentssurvey"> </div>

Unleashing the Power of Graph Learning through LLM-based Autonomous Agents


28th of August 2023

RecMind: Large Language Model Powered Agent For Recommendation


22th of August 2023

A Survey on Large Language Model based Autonomous Agents


21st of August 2023

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

18th of August 2023

<div id="got"></div>

Graph of Thoughts: Solving Elaborate Problems with Large Language Models


AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation


WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct


17th of August 2023

<div id="rest"></div>

Reinforced Self-Training (ReST) for Language Modeling


Never-ending Learning of User Interfaces


3rd of August 2023

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models


25th of July 2023

WebArena: A Realistic Web Environment for Building Autonomous Agents


20th of July 2023

Textbooks Are All You Need


BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs


16th of July 2023

Communicative Agents for Software Development


xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein


14th of July 2023

Large Language Models Understand and Can be Enhanced by Emotional Stimuli

23rd of June 2023

<div id="lili"> </div>

LLM Powered Autonomous Agents


8th June 2023

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases


6th of June 2023

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach


5th June 2023

SELFEVOLVE: A Code Evolution Framework via Large Language Models


3th June 2023

Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services


3th June 2023

Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions


2nd of June 2023


26th of May 2023

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models


Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing


25th of May 2023

Voyager: An Open-Ended Embodied Agent with Large Language Models


24th May 2023

Reasoning with Language Model is Planning with World Model


Gorilla: Large Language Model Connected with Massive APIs


18th of May 2023

Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation

17th May 2023

<div id="tot"></div>

Tree of Thoughts: Deliberate Problem Solving with Large Language Models


Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction


13th of May 2023

BabyCatAGI: Fast and Feline

12th of May 2023

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?


9th of May 2023

ImageBind: One Embedding Space To Bind Them All


3rd of May 2023

Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings


30th of April 2023

BabyBeeAGI: Task Management and Functionality Expansion on top of BabyAGI


<div id="consciousnesstest"> </div>

26 of April 2023

["Inside OpenAI Entire Talk" by Stanford eCorner


21st of April 2023

Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback


13th of April 2023

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment


11th of April 2023

ChemCrow: Augmenting large-language models with chemistry tools


Teaching Large Language Models to Self-Debug


7th of April 2023

ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions


6th April 2023

Generative Agents: Interactive Simulacra of Human Behavior


31 March 2023

CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society


30th of March 2023

Self-Refine: Iterative Refinement with Self-Feedback


HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace


DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents


29th of March 2023

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs


28th March 2023

Task-driven Autonomous Agent Utilizing GPT-4, Pinecone, and LangChain for Diverse Applications


Sparks of Artificial General Intelligence: Early experiments with GPT-4


20th March 2023

Reflexion: Language Agents with Verbal Reinforcement Learning


Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference


Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback


8th of December 2022

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models


20th of October 2022

Large Language Models Can Self-Improve


31st of August 2022

<div id="emerging"></div>

Emergent Abilities of Large Language Models


<div id="generalistagent"></div>

12th of May 2022

A Generalist Agent


Large-Scale Retrieval for Reinforcement Learning

Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning

Retrieval-Augmented Reinforcement Learning

Evaluating Multimodal Interactive Agents

Intra-agent speech permits zero-shot task acquisition

How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy

Rapid Task-Solving in Novel Environments

A Unified, Scalable Framework for Neural Population Decoding

Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution


19th of April 2022

<div id="worldmodel2"></div>

Deep learning, reinforcement learning, and world models

<div id="star"></div>

28th of March 2022

STaR: Bootstrapping Reasoning With Reasoning


21st of March 2022

<div id="selfconsistency"></div>

Self-Consistency Improves Chain of Thought Reasoning in Language Models


Chain of Hindsight Aligns Language Models with Feedback


7th of March 2022

Shared computational principles for language processing in humans and deep language models


28th of January 2022

<div id="cot"></div>

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models


<div id="languageagentdefinition"></div>

26th of March 2021

Alignment of Language Agents


<div id="qstar"></div>

8th of February 2021

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks


28th of May 2020

<div id="multitask"></div>

Language Models are Few-Shot Learners


22th of May 2020

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks


12th of November 2020

<div id="rewardisenough"> </div>

Reward is enough


24th of November 2019

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

<div id="resourcecloud"> </div>

28th of July 2005

The Emotion Machine. Draft.


<div id="autonomousagentdefinition"> </div>

12th of August 1996

Is it an Agent, or Just a Program?: A Taxonomy for Autonomous Agents.


Prediction and Adaptation in an Evolving Chaotic Environment


A Learning Algorithm that Mimics Human Learning


<div id="astarssearch"> </div>

24th of November 1967

A formal Basis for the Heuristic Determination of Minimum Cost Paths


Citation

How to cite my work?

@misc{MaattaAutonomousAgents2023,
  author = {Teemu Maatta},
  title = {Autonomous Agents},
  year = {2023},
  howpublished = {\url{https://github.com/tmgthb/Autonomous-Agents}},
  note = {Accessed: YYYY-MM-DD}
}


Back to top