Home

Awesome

COOM

COOM is a Continual Learning benchmark for embodied pixel-based RL, consisting of task sequences in visually distinct 3D environments with diverse objectives and egocentric perception. COOM is designed for task-incremental learning, in which task boundaries are clearly defined. A short demo of COOM is available on Youtube.

<p align="center"> <img src="assets/gifs/demo1.gif" alt="Demo1" style="vertical-align: top;"/> <img src="assets/gifs/demo2.gif" alt="Demo2" style="vertical-align: top;"/> </p>

Installation

To install COOM from PyPi, just run:

$ pip install COOM

Alternatively, to install COOM from source, clone this repo, cd to it, and then:

  1. Clone the repository
$ git clone https://github.com/hyintell/COOM
  1. Navigate into the repository
$ cd COOM
  1. Install COOM from source with pip
$ pip install .

Environments

COOM contains 8 scenarios:

ScenarioSuccess MetricEnemiesWeaponItemsMax StepsExecute ActionStochasticityImage
PitfallDistance Covered1000JUMPPitfall tile locationsDefault
Arms DealerWeapons Delivered1000SPEEDWeapon spawn locations, delivery locationsDefault
Hide and SeekFrames Alive2500SPEEDEnemy behaviour, item spawn locationsDefault
Floor is LavaFrames Alive2500SPEEDPlatform locationsDefault
ChainsawKill Count2500ATTACKEnemy and agent spawn locationsDefault
Raise the RoofFrames Alive2500USEAgent spawn locationDefault
Run and GunKill Count2500ATTACKEnemy and agent spawn locationsDefault
Health GatheringFrames Alive2500SPEEDHealth kit spawn locationsDefault

Every scenario except Run and Gun has 2 environments: default and hard. The full list of environment is the following:

Task Sequences for Continual Learning

To formulate a continual learning problem, we compose sequences of tasks, where each task is an environment of a scenario. The agent is trained on each task sequentially, without access to the previous tasks. The agent is continually evaluated on all tasks throughout training. The task sequence is considered solved if the agent achieves maximum success on all tasks. There are three lengths of Continual Learning task sequences in our benchmark:

  1. 8-task sequences serve as the core of the benchmark
  2. 4-task sequences are comprised of the 2<sup>nd</sup> half of an 8-task sequence
  3. 16-task sequences combine tasks of two 8-task sequences

We further distinguish between the Cross-Domain and Cross-Objective sequences.

Cross-Domain

In the cross-domain setting, the agent is sequentially trained on modified versions of the same scenario. Run and Gun is selected as basis for this CL sequence, since out of the 8 scenarios in the benchmark, it best resembles the actual Doom game, requiring the agent to navigate the map and eliminate enemies by firing a weapon. The objective and the layout of the map remain the same across tasks, whereas we modify the environment in the following ways:

  1. Changing the textures of the surrounding walls, ceiling and floor
  2. Varying the size, shape and type of enemies
  3. Randomizing the view height of the agent, and
  4. Adding objects to the environment which act as obstacles, blocking the agent’s movement.

Tasks in the Cross-Domain 8 (CD8) sequence

Default

Cross-Objective

Cross-objective task sequences employ a different scenario with a novel objective for each consecutive task, apart from only changing the visuals and dynamics of a single scenario. This presents a diverse challenge, as the goal might drastically change from locating and eliminating enemies (Run and Gun and Chainsaw) to running away and hiding from them (Hide and Seek). In a similar fashion, the scenario Floor is Lava often requires the agent to remain at a bounded location for optimal performance, whereas scenarios Pitfall, Arms Dealer, Raise the Roof, and Health Gathering endorse constant movement.

Tasks in the Cross-Objective 8 (CO8) sequence

Default

Getting Started

Below we provide a short code snippet to run a sequence with the COOM benchmark.

Basic Usage

Find examples of using COOM environments in the run_single and run_sequence scripts.

Single Environment

from COOM.env.builder import make_env
from COOM.utils.config import Scenario

env = make_env(Scenario.RAISE_THE_ROOF)
env.reset()
for steps in range(1000):
    action = env.action_space.sample()
    state, reward, done, truncated, info = env.step(action)
    env.render()
    if done:
        break
env.close()

Task Sequence

from COOM.env.continual import ContinualLearningEnv
from COOM.utils.config import Sequence

cl_env = ContinualLearningEnv(Sequence.CO8)
for env in cl_env.tasks:
    env.reset()
    done = False
    while not done:
        action = env.action_space.sample()
        state, reward, done, truncated, info = env.step(action)
        env.render()
        if done:
            break
    env.close()

Baseline Results

We have employed various popular continual learning algorithms to evaluate their performance on the COOM benchmark. The algorithms are implemented on top of the Soft-Actor-Critic (SAC) reinforcement learning algorithm. Please follow the instructions in the Continual Learning module to use the algorithms. The following table ranks the baselines from best to worst performing

MethodTypeScore
PackNetStructure0.74
ClonEx-SACMemory0.73
L2Regularization0.64
MASRegularization0.56
EWCRegularization0.54
Fine-TuningNaïve0.40
VCLRegularization0.33
AGEMMemory0.28
Perfect Memory*Memory0.89*

*The memory consumption of the method is too high to feasible run it on the longer sequences of the benchmark, so it does not follow the ranking in the table.

Evaluation Metrics

We evaluate the continual learning methods on the COOM benchmark based on Average Performance, Forgetting, and Forward Transfer.

Average Performance

The performance (success rate) averaged over tasks is a typical metric for the continual learning setting. The agent is continually evaluated on all tasks in the sequence even before encountering it. By the end of the sequence, the agent should have mastered all tasks.

Default

Forgetting

Forgetting occurs when the agent's performance on a task decreases after training on a subsequent task. This is a common problem in continual learning, as the agent has to learn new tasks while retaining the knowledge of the previous ones. We measure forgetting by comparing the performance of the agent on a task after training and at the end of the entire sequence. The image below depicts heavy forgetting in the example of AGEM. Default

Contrary to AGEM, ClonEx-SAC is able to retain the knowledge of the previous tasks. Default

Forward Transfer

Transferring learned knowledge from one task to another is a key aspect of continual learning. We measure the forward transfer of the continual learning methods by how efficiently they train on each given task compared to the Soft Actor-Critic (SAC) baseline, which is trained directly on the same from scratch. The red areas between the curves represent negative forward transfer and other colors represent positive forward transfer as depicted on the image below.

Default

Reproducing results

For reproducing the results in our paper please follow the instructions in the results module.

Acknowledgements

COOM is based on the ViZDoom platform.
The Cross-Domain task sequences and the run_and_gun scenario environment modification were inspired by the LevDoom generalization benchmark.
The base implementations of SAC and continual learning methods originate from Continual World.
Our experiments were managed using WandB.

Citation

If you use our work in your research, please cite it as follows:

@inproceedings{tomilin2023coom,
    title={COOM: A Game Benchmark for Continual Reinforcement Learning},
    author={Tomilin, Tristan and Fang, Meng and Zhang, Yudi and Pechenizkiy, Mykola},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2023}
}