Awesome

ACER

Actor-critic with experience replay (ACER) [1]. Uses batch off-policy updates to improve stability. Trust region updates can be enabled with --trust-region. Currently uses full trust region instead of "efficient" trust region (see issue #1).

Run with python main.py <options>. To run asynchronous advantage actor-critic (A3C) [2] (but with a Q-value head), use the --on-policy option.

Requirements

OpenAI Gym
Plotly
PyTorch

To install all dependencies with Anaconda run conda env create -f environment.yml and use source activate acer to activate the environment.

Results

ACER

Acknowledgements

@ikostrikov for pytorch-a3c
@apaszke for Reinforcement Learning (DQN) tutorial
@pfnet for ChainerRL

References

[1] Sample Efficient Actor-Critic with Experience Replay
[2] Asynchronous Methods for Deep Reinforcement Learning