MCTS infinite loop #256

callix2 · 2021-11-01T18:51:03Z

callix2
Nov 1, 2021

Hi @ALL,
i run into a infinite loop with the MCTS. In my game you lose if you have no more pieces in your reserve. You can get your pieces back from the board by placing them in a row of four. (This is also the way you remove pieces from your opponent, if he has pieces that form a direct extension of these 4). But if your own reserve is nearing its end, you can just stay alive with this tactic.

So the program puts four pieces one after the other into a row and gets them back again. Now the board looks exactly as it did four rounds before and the corresponding node in the search tree is used again ..again ..and again...

Do you guys have any ideas on how to avoid this behavior?

goshawk22 · 2021-11-01T19:15:30Z

goshawk22
Nov 1, 2021

Well, if I understand the rules of the game correctly, this is the best tactic to win - the agent will never lose. It might be that the game has no solution and the agent has worked out it can't win, and therefore simply tries not to lose in the simplest way.
It could also be that there is an issue with your code - maybe the rules don't detect a win properly or the scoring isn't correct.
It might help if you share a link to your code.

3 replies

callix2 Nov 1, 2021
Author

Yes he will never lose, but he won't win either, because he has to remove the opponent's pieces to do so. I don't think the game has no solution in this situation, since he could make his row as an extension of an opponent's piece to capture it, instead of in an empty area. I uploaded my code here: https://github.com/callix2/alphaZero-gipf

goshawk22 Nov 1, 2021

I see each player has a 'reserve' here and that it is used to determine if a player has lost (here).
Why does this determine if the player has won or lost?

callix2 Nov 1, 2021
Author

Thats the Game :D Each player starts with 3 pieces on the board and 5 in reserve (in the mini Version). In each round you have to play one piece. If you form a row of 3, your own pieces are returned to your reserve and any opponent's pieces that extend the row are removed from the game. If it's your turn and you can't play because your reserve is empty, you lose.
Here is the rulebook in case I could not make it understandable: https://gesellschaftsspiele.spielen.de/uploads/files/2870/57b4a7ec876ee.pdf

pavolkacej · 2021-12-11T03:44:06Z

pavolkacej
Dec 11, 2021

Hi @callix2, are you still experiencing this issue? You may try to put breakpoint to MCTS search method. Add an int variable depth, which will be increased in every nested call of the search. Try to stop the program, when depth reaches more than 10 for example.
I had similar issue, and I found out that players started to do same 2 moves for forever. Maybe you could finally limit this behavior in game logic. Feel free to ask if you need help.

1 reply

callix2 Dec 13, 2021
Author

Hi @pavolkacej, thank you for your reply. The problem was that after six moves the same state as before was reached and these six were repeated again and again. I have no idea how to prevent these behavior in game logic, because they are basically legitimate moves and can also make sense in this order. They just don't lead to victory but prevent losing.

Currently I check the depth using len(inspect.stack()) and pass a reward v=-1 as soon as 100 is reached. I propagate this value back the entire search run (so that the sign is not rotated).

However, now I have another problem. Possibly it is related to this. When I train the agent, the first or at the latest second run results in a new, better model. After that, no new, better one is found. Each run ends with 20 to 20 (or 21 to 19) victories against the old model. I tested with numIters=10, numEps=100, tempTreshold=15, updateTreshold=0.6, numMCTSSims=25 and arenaCompare=40.

Any idea?

pavolkacej · 2021-12-13T15:19:12Z

pavolkacej
Dec 13, 2021

Hi @callix2
Nice, basically I implemented something similar. My chess imlementation finds draw by 100 moves (without pawn move or capture), I return small non-zero value. I did not want to penalize player for doing this last move, as it is not his fault, sometimes 100th move is irresistible at the latest state.

Maybe you should check how did the game end. Just print the result, and you may see more into this. Not only if 1 or -1 won, but also the reason. Maybe they win because opponent got to 100th move. Add this print-out here, so it will print you the reason after every game ended during Arena (or add it to Coach too if you want)
Bofere this line for ex: https://github.com/suragnair/alpha-zero-general/blob/master/Arena.py#L63

If you experience 20/20 wins - is it only White winning? Maybe if first 20 games are won by White (1st model), and second 20 games also won by White (2nd model) - this may tell you that learning process is only good for White, and you have to change something.

Keep in mind - as I just stated, models rotate after first half of the games. For ex. first model plays as white, and then plays 20 games as black.

Be aware that sometimes when we ask about result for black, and sometimes for wait (player parameter for getGameEnded).
I guess this line was not working for me, and I removed the multiplication here. But maybe this works for you, I dont know, I find this whole code tricky :D
https://github.com/suragnair/alpha-zero-general/blob/master/Arena.py#L63

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCTS infinite loop #256

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

MCTS infinite loop #256

callix2 Nov 1, 2021

Replies: 3 comments · 4 replies

goshawk22 Nov 1, 2021

callix2 Nov 1, 2021 Author

goshawk22 Nov 1, 2021

callix2 Nov 1, 2021 Author

pavolkacej Dec 11, 2021

callix2 Dec 13, 2021 Author

pavolkacej Dec 13, 2021

callix2
Nov 1, 2021

Replies: 3 comments 4 replies

goshawk22
Nov 1, 2021

callix2 Nov 1, 2021
Author

callix2 Nov 1, 2021
Author

pavolkacej
Dec 11, 2021

callix2 Dec 13, 2021
Author

pavolkacej
Dec 13, 2021