Awesome

COOM

COOM is a Continual Learning benchmark for embodied pixel-based RL, consisting of task sequences in visually distinct 3D environments with diverse objectives and egocentric perception. COOM is designed for task-incremental learning, in which task boundaries are clearly defined. A short demo of COOM is available on Youtube.

Installation

To install COOM from PyPi, just run:

$ pip install COOM

Alternatively, to install COOM from source, clone this repo, cd to it, and then:

Clone the repository

$ git clone https://github.com/hyintell/COOM

Navigate into the repository

$ cd COOM

Install COOM from source with pip

$ pip install .

Environments

COOM contains 8 scenarios:

Scenario	Success Metric	Enemies	Weapon	Items	Max Steps	Execute Action	Stochasticity
Pitfall	Distance Covered	✗	✗	✗	1000	JUMP	Pitfall tile locations
Arms Dealer	Weapons Delivered	✗	✓	✓	1000	SPEED	Weapon spawn locations, delivery locations
Hide and Seek	Frames Alive	✓	✗	✓	2500	SPEED	Enemy behaviour, item spawn locations
Floor is Lava	Frames Alive	✗	✗	✗	2500	SPEED	Platform locations
Chainsaw	Kill Count	✓	✓	✗	2500	ATTACK	Enemy and agent spawn locations
Raise the Roof	Frames Alive	✗	✗	✗	2500	USE	Agent spawn location
Run and Gun	Kill Count	✓	✓	✗	2500	ATTACK	Enemy and agent spawn locations
Health Gathering	Frames Alive	✗	✗	✓	2500	SPEED	Health kit spawn locations

Every scenario except Run and Gun has 2 environments: default and hard. The full list of environment is the following:

pitfall-default-v0 - traverse a tunnel as far as possible without falling into a pit
pitfall-hard-v0 - the agent has reduced movement speed and there are more pits in the surface
arms_dealer-default-v0 - collect and deliver weapons to the marked locations as fast as possible
arms_dealer-hard-v0 - the map is larger and there are fewer weapons to collect
hide_and_seek-default-v0 - escape and hide from enemies as long as possible
hide_and_seek-hard-v0 - there are more enemies and they are faster
floor_is_lava-default-v0 - keep off the laval by navigating to new platforms as they reappear at new locations
floor_is_lava-hard-v0 - there are fewer platforms to stand on and their locations are in more rapid change
chainsaw-default-v0 - seek out and melee as many enemies as possible
chainsaw-hard-v0 - there are fewer enemies to find, and they are tougher to eliminate
raise_the_roof-default-v0 - locate and press switches on the walls to raise the ceiling and avoid getting crushed
raise_the_roof-hard-v0 - the ceiling is lowered faster and the switches are harder to spot
run_and_gun-default-v0 - use a pistol to eliminate as many enemies as possible
run_and_gun-hard-v0 - there are fewer enemies to find, and they are tougher to eliminate
run_and_gun-obstacles-v0 - there are obstacles in the environment which block the agent's movement
run_and_gun-green-v0 - the wall, ceiling and floor textures are green
run_and_gun-resized-v0 - the agent's view height is randomized and the targets are randomly resized
run_and_gun-monsters-v0 - the targets are replaced with monsters that move around and fight back
run_and_gun-red-v0 - the wall, ceiling and floor textures are red
run_and_gun-blue-v0 - the wall, ceiling and floor textures are blue
run_and_gun-shadows-v0 - the targets are less visible due to lowered opacity
health_gathering-default-v0 - collect health kits as fast as possible to stay alive
health_gathering-hard-v0 - there are fewer health kits to find and the environment layout is more complex

Task Sequences for Continual Learning

To formulate a continual learning problem, we compose sequences of tasks, where each task is an environment of a scenario. The agent is trained on each task sequentially, without access to the previous tasks. The agent is continually evaluated on all tasks throughout training. The task sequence is considered solved if the agent achieves maximum success on all tasks. There are three lengths of Continual Learning task sequences in our benchmark:

8-task sequences serve as the core of the benchmark
4-task sequences are comprised of the 2<sup>nd</sup> half of an 8-task sequence
16-task sequences combine tasks of two 8-task sequences

We further distinguish between the Cross-Domain and Cross-Objective sequences.

Cross-Domain

In the cross-domain setting, the agent is sequentially trained on modified versions of the same scenario. Run and Gun is selected as basis for this CL sequence, since out of the 8 scenarios in the benchmark, it best resembles the actual Doom game, requiring the agent to navigate the map and eliminate enemies by firing a weapon. The objective and the layout of the map remain the same across tasks, whereas we modify the environment in the following ways:

Changing the textures of the surrounding walls, ceiling and floor
Varying the size, shape and type of enemies
Randomizing the view height of the agent, and
Adding objects to the environment which act as obstacles, blocking the agent’s movement.

Tasks in the Cross-Domain 8 (CD8) sequence

Default

Cross-Objective

Cross-objective task sequences employ a different scenario with a novel objective for each consecutive task, apart from only changing the visuals and dynamics of a single scenario. This presents a diverse challenge, as the goal might drastically change from locating and eliminating enemies (Run and Gun and Chainsaw) to running away and hiding from them (Hide and Seek). In a similar fashion, the scenario Floor is Lava often requires the agent to remain at a bounded location for optimal performance, whereas scenarios Pitfall, Arms Dealer, Raise the Roof, and Health Gathering endorse constant movement.

Tasks in the Cross-Objective 8 (CO8) sequence

Default

Getting Started

Below we provide a short code snippet to run a sequence with the COOM benchmark.

Basic Usage

Find examples of using COOM environments in the run_single and run_sequence scripts.

Single Environment

from COOM.env.builder import make_env
from COOM.utils.config import Scenario

env = make_env(Scenario.RAISE_THE_ROOF)
env.reset()
for steps in range(1000):
    action = env.action_space.sample()
    state, reward, done, truncated, info = env.step(action)
    env.render()
    if done:
        break
env.close()

Task Sequence

from COOM.env.continual import ContinualLearningEnv
from COOM.utils.config import Sequence

cl_env = ContinualLearningEnv(Sequence.CO8)
for env in cl_env.tasks:
    env.reset()
    done = False
    while not done:
        action = env.action_space.sample()
        state, reward, done, truncated, info = env.step(action)
        env.render()
        if done:
            break
    env.close()

Baseline Results

We have employed various popular continual learning algorithms to evaluate their performance on the COOM benchmark. The algorithms are implemented on top of the Soft-Actor-Critic (SAC) reinforcement learning algorithm. Please follow the instructions in the Continual Learning module to use the algorithms. The following table ranks the baselines from best to worst performing

Method	Type	Score
PackNet	Structure	0.74
ClonEx-SAC	Memory	0.73
L2	Regularization	0.64
MAS	Regularization	0.56
EWC	Regularization	0.54
Fine-Tuning	Naïve	0.40
VCL	Regularization	0.33
AGEM	Memory	0.28
Perfect Memory*	Memory	0.89*

*The memory consumption of the method is too high to feasible run it on the longer sequences of the benchmark, so it does not follow the ranking in the table.

Evaluation Metrics

We evaluate the continual learning methods on the COOM benchmark based on Average Performance, Forgetting, and Forward Transfer.

Average Performance

The performance (success rate) averaged over tasks is a typical metric for the continual learning setting. The agent is continually evaluated on all tasks in the sequence even before encountering it. By the end of the sequence, the agent should have mastered all tasks.

Default

Forgetting

Forgetting occurs when the agent's performance on a task decreases after training on a subsequent task. This is a common problem in continual learning, as the agent has to learn new tasks while retaining the knowledge of the previous ones. We measure forgetting by comparing the performance of the agent on a task after training and at the end of the entire sequence. The image below depicts heavy forgetting in the example of AGEM. Default

Contrary to AGEM, ClonEx-SAC is able to retain the knowledge of the previous tasks. Default

Forward Transfer

Transferring learned knowledge from one task to another is a key aspect of continual learning. We measure the forward transfer of the continual learning methods by how efficiently they train on each given task compared to the Soft Actor-Critic (SAC) baseline, which is trained directly on the same from scratch. The red areas between the curves represent negative forward transfer and other colors represent positive forward transfer as depicted on the image below.

Default

Reproducing results

For reproducing the results in our paper please follow the instructions in the results module.

Acknowledgements

COOM is based on the ViZDoom platform.
The Cross-Domain task sequences and the run_and_gun scenario environment modification were inspired by the LevDoom generalization benchmark.
The base implementations of SAC and continual learning methods originate from Continual World.
Our experiments were managed using WandB.

Citation

If you use our work in your research, please cite it as follows:

@inproceedings{tomilin2023coom,
    title={COOM: A Game Benchmark for Continual Reinforcement Learning},
    author={Tomilin, Tristan and Fang, Meng and Zhang, Yudi and Pechenizkiy, Mykola},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2023}
}