Different approaches for MCTS instancing in Self Play and Arena #24

evg-tyurin · 2018-01-21T17:29:49Z

evg-tyurin
Jan 21, 2018

Line 77 in a6077c9

self.mcts = MCTS(self.game, self.nnet, self.args) # reset search tree

Why do you create new instance of MCTS for every episode in Self Play while in PIT phase on Arena you use two instances (one per player) for all episodes?

suragnair · 2018-01-21T19:32:26Z

suragnair
Jan 21, 2018
Maintainer

During training, according to the paper, each game starts from a fresh MCTS search tree. During Arena, the paper doesn't mention what is done explicitly- so I thought it might be useful to maintain the tree across the games.

0 replies

evg-tyurin · 2018-01-22T07:56:59Z

evg-tyurin
Jan 22, 2018
Author

I see but there are two particular considerations.

During Arena, the experiment is not so clear because the following game results depend on the previous games.
Scalability. If a game has large enough trees and numMCTSSims is about 200 and more and game.getActionSize is about 4000 or more (see Mastering Chess and Shogi...) then total size of Qsa, Nsa, Ps, Es and Vs collected for all Arena episodes becomes huge.

0 replies

suragnair · 2018-01-22T08:07:59Z

suragnair
Jan 22, 2018
Maintainer

I agree with both your points.

However, if we clear the tree after each Arena game, it is possible that all the games will be identical. This is because temperature is set to 0 during Arena and so there is no randomness anymore. Any idea what can be done about this?

0 replies

evg-tyurin · 2018-01-22T08:18:26Z

evg-tyurin
Jan 22, 2018
Author

Maybe introducing Dirichlet noise as it was done in the paper?

0 replies

evg-tyurin · 2018-02-02T15:20:54Z

evg-tyurin
Feb 2, 2018
Author

According to the paper, the noise is used at self play only.

0 replies

suragnair · 2018-02-26T00:11:53Z

suragnair
Feb 26, 2018
Maintainer

Yes, there is some ambiguity about the Arena stage. I think the current solution is suitable since the games usually use <= 50 MCTS simulations per step, and reusing the tree increases is useful.

0 replies

zxkyjimmy · 2018-06-14T16:38:34Z

zxkyjimmy
Jun 14, 2018

@suragnair
I think it maybe lead to problem, since the second player wins on 6x6 boards under perfect play. In Arena, player1 will have a large tree, when he plays as second player. It's not fair.

Therefore, clearing the tree is necessary.

Don't worry about no randomness. There exists randomness, if we rotate or flip randomly before policy and value are predicted by neural network.

0 replies

TimYuenior · 2018-08-08T02:20:23Z

TimYuenior
Aug 8, 2018

This would work for games like Go, where symmetry is a thing. For games like Chess or 4-in-a-row, this probably won't work. Quote from the AlphaZero paper:

AlphaZero does not augment the training data and does not transform the board position during MCTS.

I don't know how they get different games in their evaluation though.

Edit: They skip the evaluation phase and the selection of best player in the paper. Maybe the games it would play against itself are deterministic and shown games against different players are different because they do introduce randomness at the start? Or maybe they did something else altogether for openings against other players to showcase AlphaZero.

0 replies

zxkyjimmy · 2018-08-08T15:36:51Z

zxkyjimmy
Aug 8, 2018

@TimYuenior
Yes, I see.
But the README.md said:

A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al).

0 replies

petosa · 2018-11-19T17:59:42Z

petosa
Nov 19, 2018

It's worth highlighting that in the AlphaZero paper, having two networks fight each other is removed. There is one neural net that is continuously updated. This is different from AlphaGo Zero. Deepmind probably made this change because there are no symmetry gaurantees in the general case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different approaches for MCTS instancing in Self Play and Arena #24

{{title}}

Replies: 10 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Different approaches for MCTS instancing in Self Play and Arena #24

evg-tyurin Jan 21, 2018

Replies: 10 comments

suragnair Jan 21, 2018 Maintainer

evg-tyurin Jan 22, 2018 Author

suragnair Jan 22, 2018 Maintainer

evg-tyurin Jan 22, 2018 Author

evg-tyurin Feb 2, 2018 Author

suragnair Feb 26, 2018 Maintainer

zxkyjimmy Jun 14, 2018

TimYuenior Aug 8, 2018

zxkyjimmy Aug 8, 2018

petosa Nov 19, 2018

evg-tyurin
Jan 21, 2018

suragnair
Jan 21, 2018
Maintainer

evg-tyurin
Jan 22, 2018
Author

suragnair
Jan 22, 2018
Maintainer

evg-tyurin
Jan 22, 2018
Author

evg-tyurin
Feb 2, 2018
Author

suragnair
Feb 26, 2018
Maintainer

zxkyjimmy
Jun 14, 2018

TimYuenior
Aug 8, 2018

zxkyjimmy
Aug 8, 2018

petosa
Nov 19, 2018