Home

Awesome

VIN: Value Iteration Networks

This is an implementation of Value Iteration Networks (VIN) in PyTorch to reproduce the results.(TensorFlow version)

Architecture of Value Iteration Network

Key idea

Learned Reward Image and Its Value Images for each VI Iteration

VisualizationGrid worldReward ImageValue Images
8x8<img src="imgs/grid_8x8.jpeg" width="150"><img src="imgs/reward_8x8.png" width="300"><img src="imgs/value_function_8x8.gif" width="300">
16x16<img src="imgs/grid_16x16.jpeg" width="150"><img src="imgs/reward_16x16.png" width="300"><img src="imgs/value_function_16x16.gif" width="300">
28x28<img src="imgs/grid_28x28.jpeg" width="150"><img src="imgs/reward_28x28.png" width="300"><img src="imgs/value_function_28x28.gif" width="300">

Dependencies

This repository requires following packages:

Datasets

Each data sample consists of (x, y) coordinates of current state in grid world, followed by an obstacle image and a goal image.

Dataset size8x816x1628x28
Train set777607764404510695
Test set12960129440751905

Running Experiment: Training

Grid world 8x8

python run.py --datafile data/gridworld_8x8.npz --imsize 8 --lr 0.005 --epochs 30 --k 10 --batch_size 128

Grid world 16x16

python run.py --datafile data/gridworld_16x16.npz --imsize 16 --lr 0.008 --epochs 30 --k 20 --batch_size 128

Grid world 28x28

python run.py --datafile data/gridworld_28x28.npz --imsize 28 --lr 0.003 --epochs 30 --k 36 --batch_size 128

Flags:

Visualization with Visdom

We shall visualize the learned reward image and its corresponding value images for each VI iteration by using visdom.

Firstly start the server

python -m visdom.server

Open Visdom in browser in http://localhost:8097

Then run following to visualize learn reward and value images.

python vis.py --datafile learned_rewards_values_28x28.npz

NOTE: If you would like to produce GIF animation of value images on your own, the following command might be useful.

convert -delay 20 -loop 0 *.png value_function.gif

Benchmarks

GPU: TITAN X

Performance: Test Accuracy

NOTE: This is the accuracy on test set. It is different from the table in the paper, which indicates the success rate from rollouts of the learned policy in the environment.

Test Accuracy8x816x1628x28
PyTorch99.16%92.44%88.20%
TensorFlow99.03%90.2%82%

Speed with GPU

Speed per epoch8x816x1628x28
PyTorch3s15s100s
TensorFlow4s25s165s

Frequently Asked Questions

References

Further Readings