Awesome
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
This repository contains both the code for the benchmark and the data we collected so far.
The code is available under the MIT license, and the data are available under the CC-BY license.
The match data is located in matches.json
.
Setup
In the repository root:
conda create -n gameenv python=3.10
conda activate gameenv
pip install -e .
You must provide your own OpenAI API key in a file credentials.json
at the top-level directory. It should have the format:
{
"openai_api_key": "your_openai_api_key_here"
}
Replicating figures
The Python script generate_all_results.py
generates all the figures from the paper into figures/
. Use the command:
python3 generate_all_results.py
Collecting data
The scripts provided in scripts/
run some individual games with preconfigured settings. You can run/modify these scripts or create another. To run a script, execute:
sh ./scripts/<script_name>.sh
Alternatively, you can run api.play_game.play_game
directly from a Python script created in the top-level directory.
llm-reasoners
dependency
agents/rap/reasoners
comes from llm-reasoners
. See their license.
@article{hao2023reasoning,
title={Reasoning with language model is planning with world model},
author={Hao, Shibo and Gu, Yi and Ma, Haodi and Hong, Joshua Jiahua and Wang, Zhen and Wang, Daisy Zhe and Hu, Zhiting},
journal={arXiv preprint arXiv:2305.14992},
year={2023}
}