Awesome
Sotopia-Ï€: Interactive Learning of Socially Intelligent Language Agents
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
📢 Release
-
[05/01] 🎆Our custome model, Sotopia-Pi, is available for demo, thanks to Hugging Face ZeroGPU.
-
[03/14] 🎆We released our paper on arXiv on 3/14 PI day and the paper was reported by AK on twitter (here).
-
[03/07] 🔥We released our model checkpoints (BC, SR, BC+SR) on huggingface (BC model, SR model, BC+SR model).
-
[03/04] 📊We released our social converation data on huggingface (here).
📌 Table of Contents
- Introduction
- Step 0 - Preparations
- Step 1 - Social Task Generation
- Step 2 - Training Data Collection
- Step 3 - Agent Policy Update
- Step 4a - Automatic Evaluation
- Step 4b - Human Evaluation
- Citation
Introduction
We introduce Sotopia-Ï€, a method that improves the social intelligence of large language models (LLMs) through social interaction. The method involves three steps: (1) automatically generates new social tasks, (2) collects data from both expert policy and agent policy for training, and (3) updates agent policy based on positive data rated by GPT-4. The training and evaluation environment is based on the Sotopia framework.
Step 0 - Preparations
- Install dependencies:
pip install -r requirements.txt pip install flash-attn --no-build-isolation
- Set up OpenAI API key in conda environment
conda env config vars set OPENAI_API_KEY=api_key
- A Redis database needs to be set up prior to running this repo. For detailed instructions of setting up Redis database, please refer to this tutorial. Make sure to set up Redis OM url in conda environment
conda env config vars set REDIS_OM_URL="redis://user:password@host:port"
Step 1 - Social Task Generation
The first step is to generate synthesized social tasks by sampling keywords from datasets and prompting GPT-4 Turbo to generate corresponding social tasks. For detailed implementation, please refer to this section.
Step 2 - Training Data Collection
The second step is to collect data from expert (GPT-4 vs. GPT-4) as behavior cloning trajectories and from self (our model vs. our model) as self-reinforcement trajectories. To collect behavior cloning data, run
cd data_generate
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name your_exp_name --tag your_tag --agent1-model gpt-4 --agent2-model gpt-4 --push-to-db True
To collect self-reinforcement data, run
cd data_generate
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name your_exp_name --tag your_tag --agent1-model custom_model --agent2-model custom_model --push-to-db True
For detailed implementation, please refer to this section
Step 3 - Agent Policy Update
This step requires (1) filter the collected conversation data based on GPT-4 ratings and (2) update the LLM's policy through fine-tuning.
- We filter data following this pipeline and reformat data into training format.
- We fine-tune the model based on Llama Factory. Please follow this section to implement QLoRA fine-tuning.
Step 4a - Automatic Evaluation
- We first deploy the trained model on a server and inference the model via OpenAI API. See this section for detailed instructions of deploying a model via FastChat and vllm.
- Then we evaluate our model based on the Sotopia framework. Please refer to this section and the Sotopia repo for more details.
Step 4b - Human Evaluation
- We develop a personalized project based on oTree and release the human evaluation project via Prolific.
- Detailed instruction on reproducing human evaluation is mentioned here.
Citation
@misc{wang2024sotopiapi,
title={SOTOPIA-$\pi$: Interactive Learning of Socially Intelligent Language Agents},
author={Ruiyi Wang and Haofei Yu and Wenxin Zhang and Zhengyang Qi and Maarten Sap and Graham Neubig and Yonatan Bisk and Hao Zhu},
year={2024},
eprint={2403.08715},
archivePrefix={arXiv},
primaryClass={cs.CL}
}