Home

Awesome

sotopia-pi

Sotopia-Ï€: Interactive Learning of Socially Intelligent Language Agents

Project Page Paper PDF huggingface huggingface Python 3.11 pre-commit <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>twitter Code License

📢 Release

📌 Table of Contents

Introduction

title

We introduce Sotopia-Ï€, a method that improves the social intelligence of large language models (LLMs) through social interaction. The method involves three steps: (1) automatically generates new social tasks, (2) collects data from both expert policy and agent policy for training, and (3) updates agent policy based on positive data rated by GPT-4. The training and evaluation environment is based on the Sotopia framework.

Step 0 - Preparations

Step 1 - Social Task Generation

The first step is to generate synthesized social tasks by sampling keywords from datasets and prompting GPT-4 Turbo to generate corresponding social tasks. For detailed implementation, please refer to this section.

Step 2 - Training Data Collection

The second step is to collect data from expert (GPT-4 vs. GPT-4) as behavior cloning trajectories and from self (our model vs. our model) as self-reinforcement trajectories. To collect behavior cloning data, run

cd data_generate
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name your_exp_name --tag your_tag --agent1-model gpt-4 --agent2-model gpt-4 --push-to-db True

To collect self-reinforcement data, run

cd data_generate
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name your_exp_name --tag your_tag --agent1-model custom_model --agent2-model custom_model --push-to-db True

For detailed implementation, please refer to this section

Step 3 - Agent Policy Update

This step requires (1) filter the collected conversation data based on GPT-4 ratings and (2) update the LLM's policy through fine-tuning.

Step 4a - Automatic Evaluation

Step 4b - Human Evaluation

Citation

@misc{wang2024sotopiapi,
title={SOTOPIA-$\pi$: Interactive Learning of Socially Intelligent Language Agents},
author={Ruiyi Wang and Haofei Yu and Wenxin Zhang and Zhengyang Qi and Maarten Sap and Graham Neubig and Yonatan Bisk and Hao Zhu},
year={2024},
eprint={2403.08715},
archivePrefix={arXiv},
primaryClass={cs.CL}
}