Awesome
Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification
This repository is the implementation of Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification in ICML 2022. This codebase is based on the open-source maddpg-pytorch framework, and please refer to that repo for more documentation.
Citing
If you used this code in your research or found it helpful, please consider citing our paper:
@inproceedings{pan2021regularized,
title={Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification},
author={Pan, Ling and Huang, Longbo and Ma, Tengyu and Xu, Huazhe},
booktitle={International Conference on Machine
Learning},
year={2022}
}
Requirements
- Multi-agent Particle Environments: in envs/multiagent-particle-envs and install it by
pip install -e .
- python: 3.6
- torch
- baselines (https://github.com/openai/baselines)
- seaborn
- gym==0.9.4
- Multi-Agent MuJoCo: Please check the multiagent_mujoco repo for more details about the environment. Note that this depends on gym with version 0.10.5.
Datasets
Datasets for different tasks are available at the following links. Please download the datasets and decompress them to the datasets folder.
- HalfCheetah
- Cooperative Navigation
- Predator-Prey: password is m7vw
- World: password is 5k3t
Note: The datasets are too large, and the Baidu (Chinese) online disk requires a password for accessing it. Please just enter the password in the input box and click the blue button. The dataset can then be downloaded by cliking the "download" button (the second white button).
Usage
Please follow the instructions below to replicate the results in the paper.
pythonmain.py --env_id <ENVIRONMENT_NAME> --data_type <DATA_TYPE> --seed <SEED> --omar 1
- env_id: simple_spread/tag/world/HalfCheetah-v2
- data_type: random/medium-replay/medium/expert