Back propagation in MCTS tree #47
Replies: 2 comments
-
I thought about this a bit and it seems like Blackmoor is correct. Assume three states I think this is fixable and there is a valid optimization to salvage here:
This causes the values used at all steps of the search to be the same as the values used in the tree version, but saves some nn evaluations. (This optimization relies more strongly on the assumption that To avoid the second concern, games with rules about repetitions and such would have to include enough information about repetitions and such in their string representations to avoid this sort of state aliasing. In the extreme case, including the entire move history would work and make this MCTS behave like a tree-based one. |
Beta Was this translation helpful? Give feedback.
-
After applying my proposed optimization someone might reasonably object, "Why should we make decisions based on a There is some useful information that I am still failing to take advantage of, but it seems annoying to do the bookkeeping required to take advantage of it. |
Beta Was this translation helpful? Give feedback.
-
Many games allow the same game state to be reached by different sets of moves, or the same set of moves performed in a different order.
Because the current MCTS implements its 'tree' as a 'set' we can end up finding that the next state is already in the 'tree' if it has been reached via another set of moves. The current back propagation code will update the states we traversed to get here but potentially leave the 'tree' in an incomplete state as some branches will suddenly have a child those updated values have not been back propagated
An additional problem is that is it not always safe to assume that identical games state reached via different moves are equivalent (take 'go' for example) so to be a generic game playing engine we probably have to treat them as different states and implement the MCTS as a 'tree' and not as a 'set' (python list indexed on the game state).
Beta Was this translation helpful? Give feedback.
All reactions