MuZero General

A commented and documented implementation of MuZero based on the Google DeepMind paper (Nov 2019) and the associated pseudocode. It is designed to be easily adaptable for every games or reinforcement learning environments (like gym). You only need to add a game file with the hyperparameters and the game class. Please refer to the documentation and the example.

MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to Value prediction networks. See How it works.

Features

Further improvements

These improvements are active research, they are personal ideas and go beyond MuZero paper. We are open to contributions and other ideas.

Hyperparameter search
Continuous action space
Tool to understand the learned model
Support of stochastic environments
Support of more than two player games
RL tricks (Never Give Up, Adaptive Exploration, ...)

Demo

All performances are tracked and displayed in real time in TensorBoard :

Testing Lunar Lander :

Games already implemented

Cartpole (Tested with the fully connected network)
Lunar Lander (Tested in deterministic mode with the fully connected network)
Gridworld (Tested with the fully connected network)
Tic-tac-toe (Tested with the fully connected network and the residual network)
Connect4 (Slightly tested with the residual network)
Gomoku
Twenty-One / Blackjack (Tested with the residual network)
Atari Breakout

Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.

Code structure

Network summary:

Getting started

Installation

git clone https://github.com/werner-duvaud/muzero-general.git
cd muzero-general

pip install -r requirements.txt

Run

python muzero.py

To visualize the training results, run in a new terminal:

tensorboard --logdir ./results

Authors

Werner Duvaud
Aurèle Hainaut
Paul Lenoir
Contributors

Please use this bibtex if you want to cite this repository (master branch) in your publications:

@misc{muzero-general,
  author       = {Werner Duvaud, Aurèle Hainaut},
  title        = {MuZero General: Open Reimplementation of MuZero},
  year         = {2019},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/werner-duvaud/muzero-general}},
}

Getting involved

GitHub Issues: For reporting bugs.
Pull Requests: For submitting code contributions.
Discord server: For discussions about development or any general questions.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
docs		docs
games		games
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
diagnose_model.py		diagnose_model.py
models.py		models.py
muzero.py		muzero.py
notebook.ipynb		notebook.ipynb
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
self_play.py		self_play.py
shared_storage.py		shared_storage.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuZero General

Features

Further improvements

Demo

Games already implemented

Code structure

Getting started

Installation

Run

Authors

Getting involved

About

Releases

Packages

Languages

License

yinsong1986/muzero-general

Folders and files

Latest commit

History

Repository files navigation

MuZero General

Features

Further improvements

Demo

Games already implemented

Code structure

Getting started

Installation

Run

Authors

Getting involved

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages