This repo contains a python 3.8 implementation of the
Alpha Zero general-purpose reinforcement learning
algorithm for the Racing Kings Chess variant.
The implementation is modular and can easily be adapted for other chess variants that are supported
by the python-chess library.
The goal in this variant is to reach any square in the 8th rank with your king. Whoever gets there first, wins the game (unless white reaches it first and black can reach it in his next move, then it's a draw). No checks are allowed. Hence, walking into a check is also illegal. The starting board looks like this:
From personal experience, the average length of Racing Kings game for high rated players is
~15 moves. The number of legal actions per position is significantly smaller than that of regular
Chess. This makes the Racing Kings variant a much easier game to master, than regular Chess.
AphaZero is a computer program developed by Artificial Intelligence research company DeepMind to master the games of Chess, Shogi and Go. It uses an algorithm very similar to that of AlphaGoZero. The original paper by DeepMind can he found here. In this repo, the implementation of Alpha Zero is the same as described in the paper. Of course, some modifications have been made in the state and action representations in order to match the Racing Kings variant.
Unless you have access to a supercluster, self-play will be way too slow. The functionality has been implemented, but it's just not practical if you do not have the resources to train. The process takes time because the agent has to figure out the game by himself (i.e. checkmate the opponent by moving randomly).
Since self-play is costly, supervised learning has also been implemented in order to speed up
training. This means that human knowledge is being inserted in the model, which wonders off the
world of Alpha Zero. That's why it is a separate method, and can be omitted. The
database used is from
lichess.org, and can be downloaded locally using the
download bash script.
All the configuration files are located in the configurations. They are used for running the main scripts. They can easily be edited manually, but make sure that the data types are correct.
The source code for the project is located in the src python package. The code is organized in the following modules:
- Agent: Contains an implementation of the Alpha Zero Chess Agent.
- Datasets: Contains implementations of torch class Datasets used during training.
- Environment: This package contains 2 sub-packages:
- Variants: Contains the base class for a Chess environment, and the variants that have been implemented and inherit the base class (so far, only Racing Kings).
- Actions: Contains the base class for representing Chess actions (move), which is used as an interface between the environment and the agent, and an implementation for every Chess variant available (again, so far only Racing Kings).
- Monte Carlo Tree Search: Contains an implementation of the Monte Carlo Tree Search algorithm used in the Alpha Zero algorithm, as described in the paper.
- Neural Network: Contains a PyTorch implementation of the Neural Network used in the Alpha Zero algorithm. The architecture can be configured from the configuration files.
- Utilities: Contains different utility functions.
- Main scripts: Train script, Evaluation script and Training Supervised script.
Unit Tests have been implemented in the tests directory to make sure that no
functionalities break when adding new features.
Activate a virtual environment and install the required python libraries:
$ conda activate venv
$ pip install -r requirements.txt
In order to train your agent with self play, use the following commands:
$ cd src
$ python3 train.py --train-config [path_to_train_configuration_file]
--nn-config [path_to_neural_network_configuration_file]
--nn-checkpoints [path_to_directory_to_save_nn_weights]
--mcts-config [path_to_monte_carlo_tree_search_configuration_file]
--device [cpu | cuda]
Note: The [path_to_directory_to_save_nn_weights] must be pointing to an already existing directory.
An example of running the train script can be:
$ python3 train.py --train-config ../configurations/training_hyperparams.ini
--nn-config ../configurations/neural_network_architecture.ini
--nn-checkpoints ../models/checkpoints
--mcts-config ../configurations/mcts_hyperparams.ini
--device cpu
In order to play against the agent, you must have trained him and stored the weights of the NN in a specific file. To play against him, use the command:
$ cd src
$ python3 evaluate.py --nn-config [path_to_neural_network_configuration_file]
--pre-trained-weights [path_to_file_with_NN_weights]
--mcts-config [path_to_monte_carlo_tree_search_configuration_file]
--device [cpu | cuda]
--white
The last flag (--white) determines the color of the user. If specified, the user will have the white pieces. Else, if omitted, the user starts with the black pieces.
Note: For the visualization of the board, the python chess-board library has been used. It has some minor issues, that can be easily solved by following the steps in the docstring of the Base Chess Environment script. If the user does not wish to use a display (and therefore deal with this matter), the parameter --no-display can be specified.
An example of running the script is:
$ python3 evaluate.py --nn-config ../configurations/neural_network_architecture.ini
--pre-trained-weights ../models/checkpoints/iteration_0_weights.pth
--mcts-config ../configurations/mcts_hyperparams.ini
--device cpu
--white
--no-display
In order to train the agent using Supervised Learning, first you have the download the data files (.png) from lichess. You can use the following bash script to do so: download_racing_kings_data. After the data has been downloaded (suppose in the ./Dataset directory), to train the agent (first with supervised learning, then with self play), run the commands:
$ cd src
$ python3 train_supervised.py
--train-config [path_to_train_configuration_file]
--nn-config [path_to_neural_network_configuration_file]
--nn-checkpoints [path_to_directory_to_save_nn_weights]
--supervised-train-config [path_to_supervised_train_configuration_file]
--data-root-directory [path_to_the_root_directory_containing_data]
--parsed-data-destination-file [path_to_store_data_once_parsed]
--mcts-config [path_to_monte_carlo_tree_search_configuration_file]
--device [cpu | cuda]
An example:
$ python3 train_supervised.py
--train-config ../configurations/training_hyperparams.ini
--nn-config ../configurations/neural_network_architecture.ini
--nn-checkpoints ../models/checkpoints
--supervised-train-config ../configurations/supervised_training_hyperparams.ini
--data-root-directory ../Dataset
--parsed-data-destination-file ../Dataset/parsed_data.pickle
--mcts-config ../configurations/mcts_hyperparams.ini
--device cpu
Note: If the data has already been parsed once (it takes ~2 hours), then instead of re-parsing it when running again the supervised training script, the parameter --parsed-data [path_to_parsed_data_file] can be specified, to directly load the parsed data, like so:
$ python3 train_supervised.py
--train-config ../configurations/training_hyperparams.ini
--nn-config ../configurations/neural_network_architecture.ini
--nn-checkpoints ../models/checkpoints
--supervised-train-config ../configurations/supervised_training_hyperparams.ini
--parsed-data ../Dataset/parsed_data.pickle
--mcts-config ../configurations/mcts_hyperparams.ini
--device cpu
For future work, in order to further improve the algorithm, the following ideas can be implemented:
- Optimize the code - Make it MultiThreaded (simultaneously execute episodes of self play in the same iteration).
- Add a Curriculum Learning mechanism: sample endgame positions (could be random ones, but preferably from human data), and make the agent first play on these positions in order to discover positive rewards (i.e. checkmate) earlier, and therefore speed up learning. The paper describing this idea can be found here.
- Implement Monte Carlo Graph Search instead of the regular Tree Search. In this approach, the
search tree is generalized to an acyclic graph, grouping together similar positions and hence
reducing significantly the state space. The paper describing this approach can be found
here.
The code is very flexible. The Alpha Zero agent and the Monte Carlo Tree Search classes have been implemented to be compatible with any Chess environment that inherits from the Base Chess Environment, and any action-translator that inherits from the Move Translator. Thus, to add a new variant, you have to follow the next 3 steps:
- Create a Wrapper class for that variant, that uses the Python-Chess library, like the one already implemented for Racing Kings here.
- Create a MoveTranslator class for that variant, like the one implemented here.
- Adjust the main scripts (train.py, evaluate.py and train_supervised.py) to use the classes of the variant you just implemented in the previous 2 steps.
Feel free to contribute.