Home

Awesome

<div align="center"> <h1> <img src="figures/agent.png" height=50 align="texttop"> Tell Me More!</h1> </div> <p align="center"> <a target="_blank"> <img src="https://img.shields.io/badge/License-Apache_2.0-green.svg"> </a> <a target="_blank"> <img alt="GitHub" src="https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat"> </a> </p> <p align="center"> <a href="#features">Features</a> • <a href="#training">Training</a> • <a href="#Evaluation">Evaluation</a> • <a href="#Citation">Citation</a> </p>

The repo is for the implementation and evaluation of Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution.

Source codes and datasets for Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents. We release Intention-in-Interaction (IN3) benchmark and develop Mistral-Interact, capable of discerning vague instructions and recovering missing details.

✨ Features

Mistral-Interact has the following features:

<div><a id="Introduction"></a></div>

📖 Introduction

Intention-in-Interaction (IN3) benchmark

Current agent benchmarks usually assume the clearance of given tasks and exclude user intention understanding as an important aspect for evaluation. Given this ignorance in assessment, we formulate Intention-in-Interaction (IN3), a benchmark aiming to test the agent’s interaction ability through explicit task vagueness judgment and user intention understanding.

It is located in the data/IN3 directory. You can also download it from here.

<div align="center"> <img src="figures/dataset_a.png" alt="Construction of IN3" width="600"/> <br/> <figcaption><b>Figure 1: Construction of IN3.</b></figcaption> </div>

As illustrated in the figure above , with human-written seed tasks (Step 1), the model iteratively generates new tasks to augment the dataset, while sampling demonstrations from the dataset as new examples for itself to perform the next round of generation (Step 2). We perform human annotation of each task’s vagueness, missing details, and each detail’s importance level and potential options with the help of GPT-4 (Step 3). GPT-4 will first suggest the task’s vagueness and potential missing details with options and importance level, while human annotators take them as references and adapt them with their own perspectives and intentions.

Mistral-Interact

<div><a id="Training"></a></div>

🛠️ Training

Construction of Training Data

It is located in the data/interactions directory. You can also download it from here.

<div align="center"> <img src="figures/dataset_bc.png" alt="Construction of Training Data" width="800"/> <br/> <figcaption><b>Figure 2: Construction of training data.</b></figcaption> </div>

With IN3's annotations regarding task vagueness, missing details, and potential options, we apply several strategies during the construction of conversation records to better inspire the target model's robust inquiry and reasoning ability.

Usage

We utilize the model-center framework to conduct full-parameter fine-tuning of Mistral-7B on two 80GB A800s. Specific hyper-parameters can be tuned in scripts/sft.sh. Here are some parameters need to check:

Just run the script in the root of repo to start training:

bash scripts/sft.sh
<div><a id="Evaluation"></a></div>

🎮 Inference

Download Mistral-Interact here, and put it under ./models. The model weights downloaded from huggingface is the format of huggingface. For inference, we need to convert the format from huggingface to model-center using src/hf_2_mc.py.

Then run the following script in the root of repo to start inferencing:

bash scripts/test_one_new.sh

📊 Evaluation

An agent's intention understanding capability can be assessed directly through user interaction and indirectly through downstream task execution.

Instruction Understanding

Instruction understanding does not involve any real-time agent execution, so we directly evaluate the language models themselves during interaction to judge their capability to serve as a robust upstream module in agent design.

Metrics

Usage

We use the test split of IN3 agent tasks for evaluation. Conversation records based on Mistral-Interact, LLaMA-2-7B-Chat, Mistral-7B-Instruct-v0.2, and GPT-4 are located in here.

Results

<center> <figure> <img src="figures/exp_results.png" width="800" height="300"> </figure> </center>

Instruction Execution

To evaluate the effectiveness of the implicit intention understanding for instruction execution, we integrate Mistral-Interact as an upstream interaction module into the XAgent framework, an autonomous agent system for complex task solving.

Metrics

Results

<center> <figure> <img src="figures/exp_results_2.png" width="1000" height="170"> </figure> </center>

Case Study

<div align="center"> <img src="figures/xagent_case_study.png" alt="Case study on the agent execution process before and after interaction with Mistral-Interact in agent design." width="800"/> <br/> <figcaption><b>Figure 3: Case study on the agent execution process.</b></figcaption> </div> <div><a id="Contributions"></a></div>

🌟Contributions

<div><a id="Citation"></a></div>

Citation

Feel free to cite our paper if you find it is useful.

@article{cheng2024tell,
  title={Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents},
  author={Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun},
  journal={arXiv preprint arXiv:2402.09205},
  year={2024}
}