Awesome
Fast Population-Based Reinforcement Learning
This repository contains the code for the paper "Fast Population-Based Reinforcement Learning on a Single Machine paper from InstaDeep", (Flajolet et al., 2022) :computer::zap:.
First-time setup
Install Docker
This code requires docker to run. To install docker please follow the online instructions here. To enable the code to run on GPU, please install Nvidia-docker (as well as the latest nvidia driver available for your GPU).
Build and run a docker image
Once docker and docker Nvidia are installed, you can simply build the docker image with the following command:
make build
and, once the image is built, start the container with:
make dev_container
Inside the container, you can run the nvidia-smi
command to verify that your GPU is found.
Run preconfigured scripts
Replicate the experiments from the paper
We provide scripts and commands to replicate the experiments discussed in the paper. All these commands are defined in the Makefile at the root of the repository.
To replicate the experiments corresponding to Figure 2 (where we measure the runtime of a population-wide update step with various implementations), run:
make run_timing_sactd3
make run_timing_dqn
To replicate the experiments discussed in Section 5 (which correspond to full training runs), run the following:
make run_td3_cemrl
make run_td3_dvd
make run_td3_pbt
make run_sac_pbt
Note that dvd training runs are unstable and sometimes crash early on due to NaNs.
We use tensorboard
to log metrics during the training run. The tensorboard command
to run to visualize them is printed when the experiment starts.
Launch a test script
Run the following command to start a short test which validates that the code in the training scripts is working as expected.
make test_training_scripts
Contributors
<a href="https://github.com/thomashirtz" title="Thomas Hirtz"><img src="https://github.com/thomashirtz.png" height="auto" width="50" style="border-radius:50%"></a> <a href="https://github.com/flajolet" title="Arthur Flajolet"><img src="https://github.com/flajolet.png" height="auto" width="50" style="border-radius:50%"></a> <a href="https://github.com/cibeah" title="Claire Bizon Monroc"><img src="https://github.com/cibeah.png" height="auto" width="50" style="border-radius:50%"></a> <a href="https://github.com/ranzenTom" title="Thomas Pierrot"><img src="https://github.com/ranzenTom.png" height="auto" width="50" style="border-radius:50%"></a>
Citing this work
If you use the code or data in this package, please cite:
@inproceedings{flajolet2022fast,
title={Fast Population-Based Reinforcement Learning on a Single Machine},
author={Flajolet, Arthur and Monroc, Claire Bizon and Beguir, Karim and Pierrot, Thomas},
booktitle={International Conference on Machine Learning},
pages={6533--6547},
year={2022},
organization={PMLR}
}