Home

Awesome

<!-- START doctoc generated TOC please keep comment here to allow auto update --> <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->

Table of Contents generated with DocToc

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

kgsgo-dataset-preprocessor

Dataset preprocessor for the KGS go dataset, eg according to Clark and Storkey input planes

#goal

The goal of this project is to take the data from the kgsgo website, and make it available into a somewhat generic format, that can be fed into any go-agnostic learning algorithm. The guidelines used for the creation of this project is to be able to somewhat reproduce the experiments in the Clark and Storkey paper, and also somewhat targetting the Maddison et al paper.

#Pre-requisites

v1 vs v2 format

v1 format

##Instructions

These are written for linux. They may need some slight tweaking for Windows

Type:

git clone --recursive https://github.com/hughperkins/kgsgo-dataset-preprocessor.git
cd kgsgo-dataset-preprocessor
python kgs_dataset_preprocessor.py

##Results

##Data format of resulting file

#Data processing applied

##MD5sum

When I run it, I get md5sums:

850d2c91b684de45f39a205378fd7967  kgsgo-test.dat
80cfa39797fa1ea32af30191b2fb962c  kgsgo-train10k.dat

If it's different, it doesn't necessarily matter, but if it's the same, it's a good sign :-)

v2 format

v2 format vs v1 format

After writing v1 format as detailed above, I noticed some things I'd prefer to do differently. Therefore, v2 format modifies these things, but without changing anything detailed above. If you continue to use kg_dataset_preprocessor.py, then the data produced will be unchanged. In addition the filenames produced by v2 do not overwrite those produced by the earlier version.

v2 changes the following:

mlv2-n=347-numplanes=7-imagewidth=19-imageheight=19-datatype=int-bpp=1

Running v2 format processor

python kgs_dataset_preprocessor_v2.py

Available options:

python kgs_dataset_preprocessor_v2.py dir=data sets=test,train10k,trainall

(this is the default in fact, if you run with no arguments)

md5 sums

When I run this, I get the following md5 sums. If these are different for you, it's not necessarily an issue. If they are the same, this is a good sign :-)

57382be81ef419a5f1b1cf2632a8debf  kgsgo-test-v2.dat
6172e980f348103be3ad06ae7f946b47  kgsgo-train10k-v2.dat
20440801e72452b6714d5dd061673973  kgsgo-trainall-v2.dat

File sizes:

$ ls -lh kgsgo-*v2.dat
-rw-rw-r-- 1 ubuntu ubuntu 5.8M Feb  8 05:58 kgsgo-test-v2.dat
-rw-rw-r-- 1 ubuntu ubuntu 601M Feb  8 06:13 kgsgo-train10k-v2.dat
-rw-rw-r-- 1 ubuntu ubuntu  11G Mar  7 15:58 kgsgo-trainall-v2.dat

Example loader

#Third-party libraries used

#Related projects

I'm building a convolutional network library in OpenCL, aiming to train this, at ClConvolve