Different approaches for MCTS instancing in Self Play and Arena #24
Replies: 10 comments
-
During training, according to the paper, each game starts from a fresh MCTS search tree. During Arena, the paper doesn't mention what is done explicitly- so I thought it might be useful to maintain the tree across the games. |
Beta Was this translation helpful? Give feedback.
-
I see but there are two particular considerations.
|
Beta Was this translation helpful? Give feedback.
-
I agree with both your points. However, if we clear the tree after each Arena game, it is possible that all the games will be identical. This is because temperature is set to 0 during Arena and so there is no randomness anymore. Any idea what can be done about this? |
Beta Was this translation helpful? Give feedback.
-
Maybe introducing Dirichlet noise as it was done in the paper? |
Beta Was this translation helpful? Give feedback.
-
According to the paper, the noise is used at self play only. |
Beta Was this translation helpful? Give feedback.
-
Yes, there is some ambiguity about the Arena stage. I think the current solution is suitable since the games usually use <= 50 MCTS simulations per step, and reusing the tree increases is useful. |
Beta Was this translation helpful? Give feedback.
-
@suragnair Therefore, clearing the tree is necessary. Don't worry about no randomness. There exists randomness, if we rotate or flip randomly before policy and value are predicted by neural network. |
Beta Was this translation helpful? Give feedback.
-
This would work for games like Go, where symmetry is a thing. For games like Chess or 4-in-a-row, this probably won't work. Quote from the AlphaZero paper:
I don't know how they get different games in their evaluation though. Edit: They skip the evaluation phase and the selection of best player in the paper. Maybe the games it would play against itself are deterministic and shown games against different players are different because they do introduce randomness at the start? Or maybe they did something else altogether for openings against other players to showcase AlphaZero. |
Beta Was this translation helpful? Give feedback.
-
@TimYuenior
|
Beta Was this translation helpful? Give feedback.
-
It's worth highlighting that in the AlphaZero paper, having two networks fight each other is removed. There is one neural net that is continuously updated. This is different from AlphaGo Zero. Deepmind probably made this change because there are no symmetry gaurantees in the general case. |
Beta Was this translation helpful? Give feedback.
-
alpha-zero-general/Coach.py
Line 77 in a6077c9
Why do you create new instance of MCTS for every episode in Self Play while in PIT phase on Arena you use two instances (one per player) for all episodes?
Beta Was this translation helpful? Give feedback.
All reactions