Home

Awesome

<!-- Improved compatibility of back to top link: See: https://github.com/othneildrew/Best-README-Template/pull/73 --> <!-- <a name="readme-top"></a> --> <!-- PROJECT SHIELDS --> <!-- *** I'm using markdown "reference style" links for readability. *** Reference links are enclosed in brackets [ ] instead of parentheses ( ). *** See the bottom of this document for the declaration of the reference variables *** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use. *** https://www.markdownguide.org/basic-syntax/#reference-style-links --> <!-- [![LinkedIn][linkedin-shield]][linkedin-url] --> <!-- :magic_wand: --> <!-- $\ddot{a}$ ๐Ÿช„--> <div align="center"> <a href="https://zhiyuanhubj.github.io/MAgIC/"> <img src="imgs/logo.png" alt="Logo" width="140" height="140"> </a> <h2 align="center">

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration</h2>

<p align="center"> A competition-based benchmark with quantitative metrics for Large Language Model Powered Multi-agent system. <br /> <a href="https://github.com/cathyxl/MAgIC/issues">๐Ÿ› Report Bug</a> ยท <a href="https://zhiyuanhubj.github.io/MAgIC/">๐Ÿ“ƒ Main Page</a> ยท <a href="https://arxiv.org/abs/2311.08562">๐Ÿ“– Paper</a> <a href="https://arxiv.org/abs/2311.08562"> ๐Ÿ“Š Leaderboard</a> </p> <a href="https://github.com/cathyxl/MAgIC/graphs/contributors"> <img src=https://img.shields.io/github/contributors/cathyxl/MAgIC.svg?style=for-the-badge height="20"/> </a> <a href="https://github.com/cathyxl/MAgIC/network/members"> <img src=https://img.shields.io/github/forks/cathyxl/MAgIC.svg?style=for-the-badge height="20"/> </a> <a href="https://github.com/cathyxl/MAgIC/issues"> <img src=https://img.shields.io/github/issues/cathyxl/MAgIC.svg?style=for-the-badge height="20"/> </a> <a href="https://github.com/cathyxl/MAgIC/stargazers"> <img src=https://img.shields.io/github/stars/cathyxl/MAgIC.svg?style=for-the-badge height="20"/> </a> <!-- <a href="https://github.com/cathyxl/MAgIC/blob/master/LICENSE.txt"> <img src=https://img.shields.io/github/forks/cathyxl/MAgIC.svg?style=for-the-badge height="20"/> </a> --> </div> <br> <div align="center" width="600" height="1200" > <a href="https://www.youtube.com/embed/iNqq75Uf57M" title="MAgIC Demo"> <img src="imgs/demo_front.png" alt="Demo" width="1000" height="450" /> </a> </div>

๐Ÿ“Œ MAgIC Benchmark News ๐ŸŽ‰๐Ÿ”ฅ

<!-- TABLE OF CONTENTS --> <!-- [![Product Name Screen Shot][tease]]() --> <!-- <details> <Summary>Table of Contents</Summary> <ol> <li> <a href="#about-the-project">About The Project</a> </li> <li> <a href="#getting-started">Getting Started</a> <ul> <li><a href="#prerequisites">Prerequisites</a></li> <li><a href="#installation">Installation</a></li> </ul> </li> <li><a href="#usage">Usage</a></li> <li><a href="#roadmap">Roadmap</a></li> <li><a href="#license">License</a></li> <li><a href="#contact">Contact</a></li> </ol> </details> --> <!-- ABOUT THE PROJECT -->

๐Ÿ“– About The Project

Scenarios

MAgIC provides a benchmark that can quantitatively measure the abilities of Cognition, Adaptability, Rationality and Collaboration of Large Language Models within multi-agent sytems. Our benchmark are based competition on 5 scenarios:

PGM-Aware Agent Structure

ghaha

Evaluation Metrics and Game Win Rate

Product Name Screen Shot

<p align="right">(<a href="#readme-top">back to top</a>)</p>

Leaderboard

We have tested 10 models in our benchmark, and the PGM method we proposed has achieved a remarkable improvement. Product Name Screen Shot

PGM Performance

PGM improvements on different LLMs. Product Name Screen Shot

Getting Started

Installation

  1. Environment preparation
# conda virtual environment
conda create -n magic_llm python=3.9
conda activate magic_llm
 
# or python3 virtual environment
mkdir magic_llm
python3 -m venv magic_llm
source magic_llm/bin/activate
  1. Install required environments
pip3 install -r requirements.txt

Run competition and evaluation

  1. Get your own OpenAI API Key, and set $openai_api_key$
export OPENAI_API_KEY=$openai_api_key$
  1. Run experiments and calculate metrics. Now this code verson only support openai models, if you want to test your own LLMs, please refer to our leaderboard website to test your LLM and upload your results.
python3 arena_runner.py
<p align="right">(<a href="#readme-top">back to top</a>)</p> <!-- ROADMAP -->

Roadmap

<p align="right">(<a href="#readme-top">back to top</a>)</p> <!-- LICENSE -->

License

Distributed under the MIT License. See LICENSE.txt for more information.

<p align="right">(<a href="#readme-top">back to top</a>)</p> <!-- CONTACT -->

Contact

Lin Xu- @Lin_Xu_ - cathyxl2016@gmail.com

<p align="right">(<a href="#readme-top">back to top</a>)</p> <p align="right">(<a href="#readme-top">back to top</a>)</p>

Citation

@article{xu2023magic,
      title={MAgIC: Benchmarking Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration}, 
      author={Lin Xu and Zhiyuan Hu and Daquan Zhou and Hongyu Ren and Zhen Dong and Kurt Keutzer and See Kiong Ng and Jiashi Feng},
      year={2023},
      journal={arXiv preprint arXiv: 2311.08562}
}
<!-- MARKDOWN LINKS & IMAGES --> <!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->