Skip to content

Latest commit

 

History

History
47 lines (36 loc) · 2.37 KB

README.md

File metadata and controls

47 lines (36 loc) · 2.37 KB

MuZero Reanalyze Implementation And Investigation

This a reimplementation of MuZero Reanalyze using the open-source version of MuZero Duvaud, Werner; Hainaut, Aurèle and Lenoir, Paul (see README_MuZeroGeneral).

The implementation was tested on Cartpole-v1 from OpenAi Gym and the implementation of Tic Tac Toe from the same authors of MuZero General. Two implementations were tested:

  • A synchronous one that uses multiple worker to push batches on a queue while updating the target values and policies. The trainer process pulls one batch at a time for training. This implementation stays true to the original descirption in Appendix H of the original MuZero paper.
  • A completely asynchronous one that updates samples directly in the replay buffer. This is much faster but does not faithfully reproduce the process described in the original paper.

Installation

git clone https://github.com/PhilippeMarcotte/muzero-general.git
cd muzero-general

pip install -r requirements.txt

Command for reproducing the results

python muzero.py --game_name <configuration name> --action "Train" --logger tensorboard --seed <seed>

The configuration used are located in the games fodler. However, only the name is required. Here are all the configurations used for the experiments:

  • basic_tictactoe_ratio_0_5

  • true_reanalyze_tictactoe_ratio_0_5

  • fast_reanalyze_tictactoe_ratio_0_5

  • basic_cartpole_75_ratio_0_25 (seed=[0,10,20,30,40])

  • basic_cartpole_75_ratio_0_5 (seed=[0,10,20,30,40])

  • basic_cartpole_75_ratio_1 (seed=[0,10,20,30,40])

  • basic_cartpole_75_ratio_2 (seed=[0,10,20,30,40])

  • true_reanalyze_cartpole_75_ratio_0_25 (seed=[0,10,20,50,60])

  • true_reanalyzebasic_cartpole_75_ratio_0_5 (seed=[0,10,20,50,60])

  • true_reanalyze_cartpole_75_ratio_1 (seed=[0,10,20,50,60])

  • true_reanalyze_cartpole_75_ratio_2 (seed=[0,10,20,50,60])

  • fast_reanalyze_cartpole_75_ratio_0_25 (seed=[0,10,20,30,40])

  • fast_reanalyze_cartpole_75_ratio_0_5 (seed=[0,10,20,30,40])

  • fast_reanalyze_cartpole_75_ratio_1 (seed=[0,10,20,30,40])

  • fast_reanalyze_cartpole_75_ratio_2 (seed=[0,10,20,30,40])

Example

python muzero.py --game_name basic_cartpole_75_ratio_0_25 --action "Train" --logger tensorboard --seed 0

For further information please see the original README.