Home

Awesome

<p align="center"><img align="center" src="./assets/representative_applications.png" alt="Four representative applications of recent successes of MARL: unmanned aerial vehicles, game of Go, Poker games, and team-battle video games."/></p>

Multiagent RLlib: A unified official code releasement of MARL researches made by TJU-RL-Lab

This repository contains the released codes of representative research works of TJU-RL-Lab on the topic of Multiagent Reinforcement Learning (MARL). The research topics are classified according to the critical challenges of MARL, e.g., the curse of dimensionality (scalability) issue, non-stationarity, multiagent credit assignment, exploration–exploitation tradeoff and hybrid action.

This repository will be constantly updated to include new research works.

1. Key Features

The main contribution of this repository is that:

2. Challenges of MARL

<p align="center"><img align="center" src="./assets/markov_game.png" alt="markov_game" style="zoom:50%;" /></p>

Multiagent systems consist of multiple agents acting and learning in a shared environment. Many real-world decision making problems can be modeled as multiagent systems, such as playing the game of Go, playing real-time strategy games, robotic control, playing card games, autonomous vehicles, resource allocation problems. Despite the recent success of deep RL in single-agent environments, there are additional challenges in multiagent RL:

3. Our Solutions

To solve the above problems, we propose a series of algorithms from different point of views. A big picture is shown bellow.

<p align="center"><img align="center" src="./assets/our-work.png" alt="our solutions" /></p>

We briefly give an introduction about the most recent works here. Details about these methods can be found in the sub-directories.

3.1 The curse of dimensionality (scalability) issue

3.2 Non-stationarity

3.3 Multiagent credit assignment problem

3.4 Exploration–exploitation tradeoff

4. Directory Structure of this Repo

an overall view of research works in this repo

CategorySub-CategoriesResearch Work (Conference)Progress
scalabilityscalable multiagent network<br /> (1) permutation invariant (equivariant) <br /> (2) action semantics<br /> (3) Game abstraction <br /> (4) dynamic agent-number network<br /><br />hierarchical MARL<br />(1) API (underreview) [1] <br />(2) ASN (ICLR-2020) [2]<br />(3) G2ANet (AAAI-2020) [3] <br />(4) DyAN (AAAI-2020) [4]<br />(5) HIL/HCOMM/HQMIX (Arxiv) [5]:white_check_mark:
credit_assignment(1) QPD (ICML-2020) [6] <br />(2) Qatten (Arxiv) [7]:white_check_mark:
non-stationarity(1) self_imitation_learning<br />(2) opponent modeling<br />(1) GASIL (AAMAS-2019) [8]<br />(2) BPR+ (NIPS-2018) [9]<br />(3) DPN-BPR+ (AAMAS2020) [10] <br />:white_check_mark:
multiagent_explorationPMIC (NIPS-2021 workshop) [11]:white_check_mark:
hybrid_actionMAPQN/MAHHQN (IJCAI-2019) [12]:white_check_mark:
negotiationalpha-Nego:white_check_mark:

5. Publication List

Scalability issue

[1] Hao X, Wang W, Mao H, et al. API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks[J]. arXiv preprint arXiv:2203.05285, 2022.

[2] Wang W, Yang T, Liu Y, et al. Action Semantics Network: Considering the Effects of Actions in Multiagent Systems[C]//International Conference on Learning Representations. 2019.

[3] Liu Y, Wang W, Hu Y, et al. Multi-agent game abstraction via graph attention neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(05): 7211-7218.

[4] Wang W, Yang T, Liu Y, et al. From few to more: Large-scale dynamic multiagent curriculum learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(05): 7293-7300.

[5] Tang H, Hao J, Lv T, et al. Hierarchical deep multiagent reinforcement learning with temporal abstraction[J]. arXiv preprint arXiv:1809.09332, 2018.

Credit assignment

[6] Yang Y, Hao J, Chen G, et al. Q-value path decomposition for deep multiagent reinforcement learning[C]//International Conference on Machine Learning. PMLR, 2020: 10706-10715.

[7] Yang Y, Hao J, Liao B, et al. Qatten: A general framework for cooperative multiagent reinforcement learning[J]. arXiv preprint arXiv:2002.03939, 2020.

Non-stationarity

[8] Hao X, Wang W, Hao J, et al. Independent generative adversarial self-imitation learning in cooperative multiagent systems[J]. arXiv preprint arXiv:1909.11468, 2019.

[9] Zheng Y, Meng Z, Hao J, et al. A deep bayesian policy reuse approach against non-stationary agents[J]. Advances in neural information processing systems, 2018, 31.

[10] Zheng Y, Hao J, Zhang Z, et al. Efficient policy detecting and reusing for non-stationarity in markov games[J]. Autonomous Agents and Multi-Agent Systems, 2021, 35(1): 1-29.

Multiagent exploration

[11] Li P, Tang H, Yang T, et al. PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration[J]. 2021.

Hybrid action

[12] Fu H, Tang H, Hao J, et al. Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019: 2329-2335.