Awesome
Maintained Fork
The maintained version of these environments, which include numerous fixes, comprehensive documentation, support for installation via pip, and support for current versions of Python are available in Gymnasium Robotics (https://github.com/Farama-Foundation/Gymnasium-Robotics, https://robotics.farama.org/)
Multi-Agent Mujoco
Benchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.
<img src="https://github.com/schroederdewitt/multiagent_mujoco/blob/master/docs/images/mamujoco.jpg" width="900" height="384">Described in the paper Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control by Christian Schroeder de Witt, Bei Peng, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Böhmer and Shimon Whiteson, Torr Vision Group and Whiteson Research Lab, University of Oxford, 2020
Installation
Note: You require OpenAI Gym Version 0.10.8 and Mujoco 2.1
Simply clone this repository and put ./src on your PYTHONPATH. To render, please also set the following environment variables:
LD_LIBRARY_PATH=${HOME}/.mujoco/mujoco210/bin;
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so
Example
from multiagent_mujoco.mujoco_multi import MujocoMulti
import numpy as np
import time
def main():
env_args = {"scenario": "HalfCheetah-v2",
"agent_conf": "2x3",
"agent_obsk": 0,
"episode_limit": 1000}
env = MujocoMulti(env_args=env_args)
env_info = env.get_env_info()
n_actions = env_info["n_actions"]
n_agents = env_info["n_agents"]
n_episodes = 10
for e in range(n_episodes):
env.reset()
terminated = False
episode_reward = 0
while not terminated:
obs = env.get_obs()
state = env.get_state()
actions = []
for agent_id in range(n_agents):
avail_actions = env.get_avail_agent_actions(agent_id)
avail_actions_ind = np.nonzero(avail_actions)[0]
action = np.random.uniform(-1.0, 1.0, n_actions)
actions.append(action)
reward, terminated, _ = env.step(actions)
episode_reward += reward
time.sleep(0.1)
env.render()
print("Total reward in episode {} = {}".format(e, episode_reward))
env.close()
if __name__ == "__main__":
main()
Documentation
Environment config
- env_args.scenario: Determines the underlying single-agent OpenAI Gym Mujoco environment
- env_args.agent_conf: Determines the partitioning (see in Environment section below), fixed by n_agents x motors_per_agent
- env_args.agent_obsk: Determines up to which connection distance k agents will be able to form observations (0: agents can only observe the state of their own joints and bodies, 1: agents can observe their immediate neighbour's joints and bodies).
- env_args.k_categories: A string describing which properties are observable at which connection distance as comma-separated lists separated by vertical bars. For example, "qpos,qvel,cfrc_ext,cvel,cinert,qfrc_actuator|qpos" means k=0 can observe properties qpos,qvel,cfrc_ext,cvel,cinert,qfrc_actuator and k>=1 (i.e. immediate and more distant neighbours) can be observed through property qpos. Note: If a property requested is not available for a given agent, it will be silently omitted.
- env_args.global_categories: Same as env_args.k_categories, but concerns some global properties that are otherwise not observed by any of the agents. Switched off by default (i.e. agents have no non-local observations).
Extending Tasks
Tasks can be trivially extended by adding entries in src/multiagent_mujoco/obsk.py.
Task configuration
Unless stated otherwise, all the parameters given below are to be used with .multiagent_mujoco.MujocoMulti
.
2-Agent Ant
env_args.scenario="Ant-v2"
env_args.agent_conf="2x4"
env_args.agent_obsk=1
2-Agent Ant Diag
env_args.scenario="Ant-v2"
env_args.agent_conf="2x4d"
env_args.agent_obsk=1
4-Agent Ant
env_args.scenario="Ant-v2"
env_args.agent_conf="4x2"
env_args.agent_obsk=1
2-Agent HalfCheetah
env_args.scenario="HalfCheetah-v2"
env_args.agent_conf="2x3"
env_args.agent_obsk=1
6-Agent HalfCheetah
env_args.scenario="HalfCheetah-v2"
env_args.agent_conf="6x1"
env_args.agent_obsk=1
3-Agent Hopper
env_args.scenario="Hopper-v2"
env_args.agent_conf="3x1"
env_args.agent_obsk=1
2-Agent Humanoid
env_args.scenario="Humanoid-v2"
env_args.agent_conf="9|8"
env_args.agent_obsk=1
2-Agent HumanoidStandup
env_args.scenario="HumanoidStandup-v2"
env_args.agent_conf="9|8"
env_args.agent_obsk=1
2-Agent Reacher
env_args.scenario="Reacher-v2"
env_args.agent_conf="2x1"
env_args.agent_obsk=1
2-Agent Swimmer
env_args.scenario="Swimmer-v2"
env_args.agent_conf="2x1"
env_args.agent_obsk=1
2-Agent Walker
env_args.scenario="Walker2d-v2"
env_args.agent_conf="2x3"
env_args.agent_obsk=1
Manyagent Swimmer
env_args.scenario="manyagent_swimmer"
env_args.agent_conf="10x2"
env_args.agent_obsk=1
Manyagent Ant
env_args.scenario="manyagent_ant"
env_args.agent_conf="2x3"
env_args.agent_obsk=1
Coupled HalfCheetah (NEW!)
env_args.scenario="coupled_half_cheetah"
env_args.agent_conf="1p1"
env_args.agent_obsk=1
CoupledHalfCheetah
features two separate HalfCheetah agents coupled by an elastic tendon. You can add more tendons or novel coupled scenarios by
- Creating a new Gym environment to define the reward function of the coupled scenario (consult
coupled_half_cheetah.py
) - Create a new Mujoco environment XML file to insert agents and tendons (see
assets/coupled_half_cheetah.xml
) - Register your env as a scenario in the MujocoMulti environment (only if you need special default observability params)