Awesome

RustyGo, my untrusted bot player project

Development stopped

While I love go and developping an efficient bot, this project also was a pretext to learn a new language, Rust. Unfortunately I spent an unordinate amount of time (3 months) to get anywhere as I was fighting the language more than reinforcement learning which is hard enough. Graphs are especially painful in Rust and Monte-Carlo Tree Search is based on graphs. Furthermore, deep learning based reinforcement learning would be impossible in Rust currently.

I have rebooted a go bot project, Golem Prime in Nim, a young language I'm much more productive in. Nim has a syntax similar to Python, the speed of C and a focus on expressivity. In about 3 days, I made Golem Prime about 200x faster than RustyGo. I'm also writing my own deep learning library Arraymancer with OpenMP, OpenCL and Cuda backends with reinforcement learning as one of the future focus.

Table of Content

RustyGo, my untrusted bot player project

Why a bot in Rust

I'm an amateur dan player hovering between 1d-3d on KGS and European Go ranking. I've been interested in Computer Go since I started back in 2004 and followed eagerly the breakthroughs in the past 10 years.

Back in October 2015, I had the sudden urge to program a go bot.

A bot in Haskell? Interesting but Monads and Functors gave me a headache. Also Haskell is reasonably fast but I had a lot of memory overflow with just prime computations so I don't think it will cut it with storing thousands of game trees.
A bot in C/C++? I don't know C or C++ and it's already been done. Plus I like programming in functional style.

What is important for the bot:

Memory, no leak, fast, compact: Go needs lot of memory operations with lots of possible moves, and board states of up to 400 moves to save in memory.
Lock-free or at least concurrent data structure: One tree of variations, multiple threads that update it
Access to neural networks libraries

So I choose Rust:

The lure of CUDA/OpenCL libraries (Arrayfire, collenchyma)
The lure of machine learning libraries (Leaf, rusty-machine)
The lure of lock-free concurrent datastructure (crossbeam, rayon)
a strong type system
nice pattern matching (not as nice as Haskell though)
function chaining without dozens of parenthesis if only I didn't have to .collect::<Vec<u16>>() every time
no Garbage Collector, control of memory layout
no memory leak (very important)
Memory on the stack by default (the fastest(TM) )
fast ~C like
Good debugguer (LLVM)
Unit testing included
A strong and helpful community
Does the coffee If only it didn't have all those brackets

Other languague considered:

Nim, high performance, very nice syntax, strong typing

History of computer go breakthrough

First was the expert systems, with handcoded logic of "good moves" Too bad those expert system, collapsed in fights and against silly moves the programmer didn't expect.
Second was the statistics system, embodied by the Monte-Carlo Tree Search (MCTS) revolution. Before each move the go program simulate thousands of games and choose the one with best win ratio Several evolutions brought them up to dan level on 9x9 on 2008 CPUs : * UCT : which balances the Monte Carlo Tree exploration of known nodes vs new nodes * RAVE / AMAF : if a move is good later, consider it good also as the first move * Integration of heuristics and patterns that influence the Monte-Carlo Tree Search Thanks to the statistical approach, programs were much stronger in fights.
Third was the DCNNs (Deep Convolutional Neural Network) Those are used in combination with MCTS to direct the search to the most promising moves.

State

RustyGo was started back in October and is currently compilable but buggy as hell, it might as well play random moves. Also everything is in one file because I was in "hack away" mode and didn't learn how to split a projects into modules.

Interesting repos

Easy to dive in Michi (Python) Pony-mcts (Pony and Rust)
MCTS only Libego (C)
Rusty: Iomrascalai
"State of the art"
- Darkforest from Facebook Research, using Torch DCNN (Lua and C)
- Pachi long-time reference open-source engine before Facebook open-sourced theirs, with early Caffe DCNN support! (C)
- Fuego slightly weaker than pachi but very strong engine as well (C++)

Future

Roadmap:

Find time to work on it
Modularize the project
Improve the data structure for concurrency, lock-free
Improve the light playouts (no heuristics) to the fastest speed (cf pony-mcts and libego)
Implement heuristics (ladder, local move preference) for heavy playouts
Create a DCNN policy network to suggest moves
...
...

Huge brain dump alert: optimization to think of in no particular order

Compilation:

cargo rustc --release -- -C target-cpu=native i.e. exploit latest SIMD
LLVM Polly - Polyhedral Optimization

Vectorization

AVX-512 board size

Localization / Branch Prediction / Cache

Reduce Cache Misses
Bitboards (needs AVX-512)
Memoization
HashMap instead of Match/if to reduce branching
reduce if statement inside loop or fold. Use map and loop/fold after
Auto sort MCTS tree according to UCT value (heap tree)
SSE instruction or Xor or a+b-|a-b| or BLAS or collenchyma/Arrayfire to fold compare max(UCT) and reduce branching

Randomization / MCTS optimization

UCT
RAVE, AMAF (All moves as first)
Fast number Gen (rdrand Hardware RNG ?)
CLOP tuning (Confident Local Optimization for Noisy Black-Box Parameter Tuning)
Simulation Balancing (cf Erica bot)
FNV Hashmap algorithm (when it's possible to use a custom hasher)

Data structure research

Need fast delete of Early Nodes, traversal from leaf to start, very fast append, lock-free, parallelizable, cacheable, low memory use, max of O(1) operations Board? Fixed Size Struct? Array by macro? Integer parametrized struct for different board size (not possible yet)

Some interesting things to explore

Trie
Heap Tree
Finger Trees
Splay trees
Treaps
Conc-Tree
Transpo Table with Judy Array
Lock free hashmaps (crossbeam)
Ctrie
Disjoint Set data structure - Union/Find

Optimization that can help

Pointer to child with max UCT value (need to separate in Controller data structure and other trees)
Pointer to last parent in the tree for backpropagation
Do not store a list/Vec of board state but a list of hash with a hash table.

Other stuff

Lock-Free MCTS
Efficient replacement of Hashmap and HashSet
Remove all occurences of SignedInteger
Remove all floating points (komi)
Connected Component
(Enriched) Common Fate Graph
Add a extra color around the 19x19 to reduce the branching (i.e. if on the edge ...)
Planar CRF (Conditional Random Fields)
VecDeque instead of Vec?
Disable Rust Array bound checking

Machine Learning

Linear Classifier http://www.lidi.info.unlp.edu.ar/WorldComp2011-Mirror/ICA8504.pdf
Neural Network & DCNN, link Facebook Research
Genetic Algorithm / Reinforcement learning

Algorithms

FloodFill (Recursive vs Queue vs Stack)
Blob Extraction
Lambda search tree
Analyze-after
All Moves as first
Grandparent Knowledge
Experience tree
nVidia: Aspiration Search, Principal Variation Search, Iterative Deepening
Background History Reply Forest
MCTS with Information Sharing
Benson's Algorithm
Partial Order Evaluation
Last Good reply
Linearly Decreasing Handicap Compensation
Value-Based Situational Compensation
Use symmetry

Consumption (of trees and iterators/vectors)

Keeping Tree nodes
Stream Fusion (for neighbor determination)
Loop Fusion
Use iterators instead of Vec for temporary methods/traits
Implement Map+Fold/mapreduce fusion = mapAccumL et mapAccumR

Parallelism

Multithreading
GPU, GPGPU, Shader programming
PTX/Cuda/OpenCL or ArrayFire GPU
Leaf/Root/Tree Parallelization
Multiblock parallel MCTS
nVidia: Principal Variation Split, Young Brother Wait Concept, Dynamic Tree Splitting
Timely dataflow
Rust libraries: Rayon, crossbeam, collenchyma

Hashing

Zobrist

Heuristics

2 liberties Semeai Solver
Tsumego Solver
Don't fill real eyes
Negascout / Negamax
Alpha Beta pruning
Heat map / Computer Vision
Futility pruning
Razoring
First/2nd lines heuristics (no if no stone around)
Basic pattern shape
Score counting
Ladders
Connection
Legal moves: faire une passe toutes les x simulations opour leur donner une valeur null/négative

Memory

use u8 everywhere (check usize)

Unit tests

Size of Intersection enum

//println!("Size is {}", std::mem::size_of::<Intersection>()); Size of Board array
//TODO: Need to profile which type is more efficient (stack vs heap, spacial locality, L1 and L2 cache ...)
//Note: Dynamic size might be hard for compiler to optimize / vectorize

Board setup:

//TODO: Unit test - make sure s^2+1 = 362 for 19x19 board size
//TODO: Unit test : check for overflow (u8 -> 256)
//TODO: Unit test - make sure numbers are properly cast as chars * // .inspect(|&x| println!("Coord:\t{:?}", x))
- // TODO: Unit test - Check A1, A19, T1, T19 and neighbors
- // TODO: Unit test - Check coordinates on 9x9 and 13x13
//TODO: Check that after each move black & white are properly switching in memory
//TODO: add unreachable
//TODO: don't capture Intersection::Border