Home

Awesome

POPGym: Partially Observable Process Gym

tests codecov

POPGym is designed to benchmark memory in deep reinforcement learning. It contains a set of environments and a collection of memory model baselines. The full paper is available on OpenReview.

Please see the documentation for advanced installation instructions and examples. The environment quickstart will get you up and running in a few minutes.

Quickstart Install

# Install base environments, only requires numpy and gymnasium
pip install popgym 
# Also include navigation environments, which require mazelib
# NOTE: navigation envs require python <3.12 due to mazelib not supporting 3.12
pip install "popgym[navigation]" 
# Install memory baselines w/ RLlib 
pip install "popgym[baselines]" 

Quickstart Usage

import popgym
from popgym.wrappers import PreviousAction, Antialias, Flatten, DiscreteAction
env = popgym.envs.position_only_cartpole.PositionOnlyCartPoleEasy()
print(env.reset(seed=0))
wrapped = DiscreteAction(Flatten(PreviousAction(env))) # Append prev action to obs, flatten obs/action spaces, then map the multidiscrete action space to a single discrete action for Q learning
print(wrapped.reset(seed=0))

POPGym Environments

POPGym contains Partially Observable Markov Decision Process (POMDP) environments following the Gymnasium interface. POPGym environments have minimal dependencies and fast enough to solve on a laptop CPU in less than a day. We provide the following environments:

EnvironmentTagsTemporal OrderingColab FPSMacbook Air (2020) FPS
BattleshipGameNone117,158235,402
ConcentrationGameWeak47,515157,217
Higher LowerGame, NoisyNone24,31276,903
Labyrinth EscapeNavigationStrong1,39941,122
Labyrinth ExploreNavigationStrong1,37430,611
MinesweeperGameNone8,43432,003
Multiarmed BanditNoisyNone48,751469,325
AutoencodeDiagnosticStrong121,756251,997
Count RecallDiagnostic, NoisyNone16,79950,311
Repeat FirstDiagnosticNone23,895155,201
Repeat PreviousDiagnosticStrong50,349136,392
Position Only CartpoleControlStrong73,622218,446
Velocity Only CartpoleControlStrong69,476214,352
Noisy Position Only CartpoleControl, NoisyStrong6,26966,891
Position Only PendulumControlStrong8,16826,358
Noisy Position Only PendulumControl, NoisyStrong6,80820,090

Feel free to rerun this benchmark using this colab notebook.

POPGym Baselines

POPGym baselines implements recurrent and memory model in an efficient manner. POPGym baselines is implemented on top of rllib using their custom model API. We provide the following baselines:

  1. MLP
  2. Positional MLP
  3. Framestacking (Paper)
  4. Temporal Convolution Networks (Paper)
  5. Elman Networks (Paper)
  6. Long Short-Term Memory (Paper)
  7. Gated Recurrent Units (Paper)
  8. Independently Recurrent Neural Networks (Paper)
  9. Fast Autoregressive Transformers (Paper)
  10. Fast Weight Programmers (Paper)
  11. Legendre Memory Units (Paper)
  12. Diagonal State Space Models (Paper)
  13. Differentiable Neural Computers (Paper)

Leaderboard

The leaderboard is available at paperswithcode.

Contributing

Follow style and ensure tests pass

pip install pre-commit
pre-commit install
pytest popgym/tests

Citing

@inproceedings{
morad2023popgym,
title={{POPG}ym: Benchmarking Partially Observable Reinforcement Learning},
author={Steven Morad and Ryan Kortvelesy and Matteo Bettini and Stephan Liwicki and Amanda Prorok},
booktitle={The Eleventh International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=chDrutUTs0K}
}