A reinforcement learning based App to play the dice game Kniffel(Yahtzee)
The final goal of this project is to build an App that plays the game Kniffel(english: Yahtzee) automatically with super-human performance. More about the game can be found here. The project contains two main parts: the training of a Kniffel agent using reinforcement learning and the implementation of an iOS App that plays Kniffel with physical dice based on the trained agent. The training is based on open-ai gym and Pytorch. The iOS App will be implemented using SwiftUI.
The plan is to test two different reinforcement learning algorithms. One is the deep Q-network(DQN), which was applied here for the game of Yahtzee. The other one is the advantage actor-critic algorithm(A2C), whose performance was reported here for the game of Yahtzee.
- We only optimize the agent under single-player mode. Multi-player mode is although potentially helpful for training, it could be difficult for agent to learn the strategy(see discussion here).
- Augmented input feature similar to this paper is used. The 112-dimensional input feature encodes the current round of the game, the current dice roll, sum of dice, availability of score categories, and current upper scores(for bonus).
- Invalid action masking is used to prevent the agent from choosing invalid actions in the game(e.g. choosing already used score category).
- Double DQN is used to improve performance.
- NN structure: 2 linear hidden layers of 128 units with ReLU activation.
- Tricks for convergence:
- lower target network update frequency( around 2000 steps) to stabilize training
- linear decay of epsilon requires less tuning and works well
- auxiliary feature to help agent understand the game(e.g. order of dice is irrelevant).
The gym environment is mostly from this repository with some modification of joker rules and debugging. All the experimental results are available on wandb. For simplicity, all training is done on a MacBook Pro with M1 Pro chip using the mps backend extension of pytorch.
Agent | avg score in 3000 games |
---|---|
random | 46.5 |
greedy | TODO |
DQN | 106.5 |
DQN in paper | 77.8 |
A2C | TODO |
A2C in article | 239.7 |
- Detection and understanding of dice roll as in example using Vision framework
- Correct feature conversion and forwarding of pytorch model
- Understandable instruction for user to roll the dice based on the decision
- Recognizing and handling the situation where the instruction is not followed
- Automatic calculation of score