Home

Awesome

PPO With Stein Control Variate

In this work, we propose a control variate method to effectively reduce variance for policy gradient methods motivated by Stein's identity.

This repository contains the code of the Proximal Policy Optimization(PPO) with Stein control variates for Mujoco environments.

The code is based on the excellent implementation of PPO.

Dependencies

Running Experiments

You can run following commands to reproduce our results:

cd optimization

# For MinVar optimization
python train.py HalfCheetah-v1 -b 10000 -ps large -po MinVar -p 500 
python train.py Walker2d-v1 -b 10000 -ps large -po MinVar -p 500 
python train.py Hopper-v1 -b 10000 -ps large -po MinVar -p 500 
 
python train.py Ant-v1 -b 10000 -ps small -po MinVar -p 500 
python train.py Humanoid-v1 -b 10000 -ps small -po MinVar -p 500 
python train.py HumanoidStandup-v1 -b 10000 -ps small -po MinVar -p 500 


# For FitQ optimization
python train.py HalfCheetah-v1 -b 10000 -ps large -po FitQ -p 500 
python train.py Walker2d-v1 -b 10000 -ps large -po FitQ -p 500 
python train.py Hopper-v1 -b 10000 -ps large -po FitQ -p 500 

python train.py Ant-v1 -b 10000 -ps small -po FitQ -p 500 
python train.py Humanoid-v1 -b 10000 -ps small -po FitQ -p 500 
python train.py HumanoidStandup-v1 -b 10000 -ps small -po FitQ -p 500


#For baseline PPO
python train.py HalfCheetah-v1 -b 10000 -ps large -c 0
python train.py Walker2d-v1 -b 10000 -ps large -c 0
python train.py Hopper-v1 -b 10000 -ps large -c 0

python train.py Ant-v1 -b 10000 -ps small -c 0
python train.py Humanoid-v1 -b 10000 -ps small -c 0
python train.py HumanoidStandup-v1 -b 10000 -ps small -c 0

The log files is in optimization/dartml_data. Further, we provide two shell scripts for tuning hyperparameters of stein control variates in the scripts folder.

For evaluation of PPO with/without Stein control variate, please see here.

Citations

If you find Stein control variates helpful, please cite following papers:

Sample-efficient Policy Optimization with Stein Control Variate. Hao Liu*, Yihao Feng*, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu (*: equal contribution). Preprint 2017

Feedbacks

If you have any questions about the code or the paper, please feel free to contact us.