Awesome

sotopia-pi

Sotopia-π: Interactive Learning of Socially Intelligent Language Agents

📢 Release

[05/01] 🎆Our custome model, Sotopia-Pi, is available for demo, thanks to Hugging Face ZeroGPU.
[03/14] 🎆We released our paper on arXiv on 3/14 PI day and the paper was reported by AK on twitter (here).
[03/07] 🔥We released our model checkpoints (BC, SR, BC+SR) on huggingface (BC model, SR model, BC+SR model).
[03/04] 📊We released our social converation data on huggingface (here).

Introduction

title

We introduce Sotopia-π, a method that improves the social intelligence of large language models (LLMs) through social interaction. The method involves three steps: (1) automatically generates new social tasks, (2) collects data from both expert policy and agent policy for training, and (3) updates agent policy based on positive data rated by GPT-4. The training and evaluation environment is based on the Sotopia framework.

Step 0 - Preparations

Install dependencies:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Set up OpenAI API key in conda environment

conda env config vars set OPENAI_API_KEY=api_key

A Redis database needs to be set up prior to running this repo. For detailed instructions of setting up Redis database, please refer to this tutorial. Make sure to set up Redis OM url in conda environment
```
conda env config vars set REDIS_OM_URL="redis://user:password@host:port"
```

Step 1 - Social Task Generation

The first step is to generate synthesized social tasks by sampling keywords from datasets and prompting GPT-4 Turbo to generate corresponding social tasks. For detailed implementation, please refer to this section.

Step 2 - Training Data Collection

The second step is to collect data from expert (GPT-4 vs. GPT-4) as behavior cloning trajectories and from self (our model vs. our model) as self-reinforcement trajectories. To collect behavior cloning data, run

cd data_generate
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name your_exp_name --tag your_tag --agent1-model gpt-4 --agent2-model gpt-4 --push-to-db True

To collect self-reinforcement data, run

cd data_generate
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name your_exp_name --tag your_tag --agent1-model custom_model --agent2-model custom_model --push-to-db True

For detailed implementation, please refer to this section

Step 3 - Agent Policy Update

This step requires (1) filter the collected conversation data based on GPT-4 ratings and (2) update the LLM's policy through fine-tuning.

We filter data following this pipeline and reformat data into training format.
We fine-tune the model based on Llama Factory. Please follow this section to implement QLoRA fine-tuning.

Step 4a - Automatic Evaluation

We first deploy the trained model on a server and inference the model via OpenAI API. See this section for detailed instructions of deploying a model via FastChat and vllm.
Then we evaluate our model based on the Sotopia framework. Please refer to this section and the Sotopia repo for more details.

Step 4b - Human Evaluation

We develop a personalized project based on oTree and release the human evaluation project via Prolific.
Detailed instruction on reproducing human evaluation is mentioned here.

Citation

@misc{wang2024sotopiapi,
title={SOTOPIA-$\pi$: Interactive Learning of Socially Intelligent Language Agents},
author={Ruiyi Wang and Haofei Yu and Wenxin Zhang and Zhengyang Qi and Maarten Sap and Graham Neubig and Yonatan Bisk and Hao Zhu},
year={2024},
eprint={2403.08715},
archivePrefix={arXiv},
primaryClass={cs.CL}
}