Awesome

ChineseChess-AlphaZero

About

Chinese Chess reinforcement learning by AlphaZero methods.

This project is based on these main resources:

DeepMind's Oct 19th publication: Mastering the Game of Go without Human Knowledge.
The great Reversi/Chess/Chinese chess development of the DeepMind ideas that @mokemokechicken/@Akababa/@TDteach did in their repo: https://github.com/mokemokechicken/reversi-alpha-zero, https://github.com/Akababa/Chess-Zero, https://github.com/TDteach/AlphaZero_ChineseChess
A Chinese chess engine with gui: https://github.com/mm12432/MyChess

Note: The distributed version is in distributed branch.

Note: This repo is still under contruction. There is also a slower implementation of MCTS player (previous version), see branch old.

Contributing

In order to build a strong chinese chess AI following the same type of techniques as AlphaZero, we need to do this with a distributed project, as it requires a huge amount of computations.

If you want to join us to build the best chinese chess AI in the world:

For instructions, see wiki
For live status, see http://alphazero.52coding.com.cn/

Environment

Python 3.6.3
tensorflow-gpu: 1.3.0
Keras: 2.0.8

Modules

Reinforcement Learning

This AlphaZero implementation consists of two workers: self and opt.

self is Self-Play to generate training data by self-play using BestModel.
opt is Trainer to train model, and generate new models.

For the sake of faster training (since I don't have 5000 TPUs), another two workers are involved:

sl is Supervised learning to train data crawled from the Internet.
eval is Evaluator to evaluate the NextGenerationModel with the current BestModel.

GUI

Requirement: pygame

python cchess_alphazero/run.py play

Screenshots

board

You can choose different board/piece styles and sides, see play with human.

How to use

Setup

install libraries

pip install -r requirements.txt

If you want to use CPU only, replace tensorflow-gpu with tensorflow in requirements.txt.

Make sure Keras is using Tensorflow and you have Python 3.6.3+.

Configuration

PlayDataConfig

nb_game_in_file,max_file_num: The max game number of training data is nb_game_in_file * max_file_num.

PlayConfig, PlayWithHumanConfig

simulation_num_per_move : MCTS number per move.
c_puct: balance parameter of value network and policy network in MCTS.
search_threads: balance parameter of speed and accuracy in MCTS.
dirichlet_alpha: random parameter in self-play.

Basic Usage

Self-Play

python cchess_alphazero/run.py self

When executed, Self-Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel.

options

--new: create new BestModel
--type mini: use mini config, (see cchess_alphazero/configs/mini.py)
--gpu '1': specify which gpu to use
--ucci: whether to play with ucci engine (rather than self play, see cchess_alphazero/worker/play_with_ucci_engine.py)
--distributed: run self play in distributed mode which means it will upload the play data to the remote server and download latest model from it

Trainer

python cchess_alphazero/run.py opt

When executed, Training will start. The current BestModel will be loaded. Trained model will be saved every epoch as new BestModel.

options

--type mini: use mini config, (see cchess_alphazero/configs/mini.py)
--total-step TOTAL_STEP: specify total step(mini-batch) numbers. The total step affects learning rate of training.
--gpu '1': specify which gpu to use

View training log in Tensorboard

tensorboard --logdir logs/

And access http://<The Machine IP>:6006/.

Play with human

python cchess_alphazero/run.py play

When executed, the BestModel will be loaded to play against human.

options

--ai-move-first: if set this option, AI will move first, otherwise human move first.
--type mini: use mini config, (see cchess_alphazero/configs/mini.py)
--gpu '1': specify which gpu to use
--piece-style WOOD: choose a piece style, default is WOOD
--bg-style CANVAS: choose a board style, default is CANVAS
--cli: if set this flag, play with AI in a cli environment rather than gui

Note: Before you start, you need to download/find a font file (.ttc) and rename it as PingFang.ttc, then put it into cchess_alphazero/play_games. I have removed the font file from this repo because it's too big, but you can download it from here.

Evaluator

python cchess_alphazero/run.py eval

When executed, evaluate the NextGenerationModel with the current BestModel. If the NextGenerationModel does not exist, worker will wait until it exists and check every 5 minutes.

options

--type mini: use mini config, (see cchess_alphazero/configs/mini.py)
--gpu '1': specify which gpu to use

Supervised Learning

python cchess_alphazero/run.py sl

When executed, Training will start. The current SLBestModel will be loaded. Tranined model will be saved every epoch as new SLBestModel.

About the data

I have two data sources, one is downloaded from https://wx.jcloud.com/market/packet/10479 ; the other is crawled from http://game.onegreen.net/chess/Index.html (with option --onegreen).

options

--type mini: use mini config, (see cchess_alphazero/configs/mini.py)
--gpu '1': specify which gpu to use
--onegreen: if set the flag, sl_onegreen worker will start to train data crawled from game.onegreen.net
--skip SKIP: if set this flag, games whoses index is less than SKIP would not be used to train (only valid when onegreen flag is set)