DQN is a technology to realize reinforcement learning, first proposed by Deep Mind in NIPS13(paper in arxiv), whose input is raw pixels and whose output is a value function estimating future rewards. Using Experience Replay, they overcame the problem of network training.
This demo is about using DQN to train a convolutional neural network to play flappy bird game. It is a practice when I learned reinforcement learning and partly reused songrotek's code, especially the game engine and basic idea. Thanks for sharing, thanks to the spirit and community of open source.
A video of the demo can be found on YouTube or 优酷 Youku if you don't have access to YouTube.
PyTorch is an elegant framework published by Facebook. I implemented the neural network and training/testing procedure using PyTorch. So you need install PyTorch to run this demo. Besides, pygame package is needed by the game engine.
At the beginning, you can play the game with a pretrained model by me. You can download the pretrained model from Google Drive (or Baidu Netdisk if Google Drive is not available) and use the following commands to play the game. Make sure that the pretrained model is in the root directory of this project.
chmod +x play.sh
./play.sh
For more detail infomation about the meaning of the arguments of the program, run python main.py --help
or refer to the code in main.py
.
You can use the following commands to train the model from scrach or finetuning(if pretrained weight file is provided).
chmod +x train.sh
./train.sh # please see `main.py` for detail info about the variables
Some tips for training:
-
Do not set
memory_size
too large(or too small). It depends on available memory in your computer. -
It takes a long time to complete training. I finetuned the model several times and change
epsilon
of ϵ-greedy exploration manually every time. -
When choose action randomly in training, I prefer to 'DO Nothing' compared to 'UP'. I think it can accelarate convergence. See
get_action_randomly
method inBrainDQN.py
for detail.
This work is based on the repo songrotek/DRL-FlappyBird and yenchenlin1994/DeepLearningFlappyBird. Thanks two authors!