🤖 TicTacToe agent based on Reinforcement leaning

This project was made as a part of the Intelligent Systems course at Faculty of Organization and Informatics.

🥅 Project goals

Goal of this project was simple - to make an agent learn to play the Tic-tac-toe game!

And to showcase it with a slick tkinter GUI!

⚒️ Features

Train Reinforcement learning Tic-Tac-Toe agents.
Set training parameters (number of iterations and learning strategy).
Save/Load trained agents.
Keep game score.
Switch players each game.
No dependencies.

Install

Clone the repo or download ZIP

$ git clone https://github.com/marzekan/tictactoe-rl-agent.git

Run main.py

$ cd tictactoe-rl-agent

$ python3 main.py

📜 Docs

This next sections is supposed to be a quick-n-easy-straight-to-the-point documentation.

Documentation begins with a demo and a step-by-step explanation of it.

Train-Save-Load-Play Demo

Following demo shows:

Training the agent
Saving trained agent
Loading trained agent
Playing loaded agent

Next submodules contain step-by-step explanations of actions performed in the gif above.

Training the agent

Click TRAIN AGENT button that opens Agent training window.
Select 'Q Agent vs. Q Agent' from the Pick agent strategy dropdown.
Enter 20000 as number of training iterations in Num. iter textbox.
Click the green Train! button.
This will begin agent training and when that is done, the File explorer window will open.

Saving trained agent

When agent training finishes File explorer window will open.
Name file: 'Another Bot' and click Save.

Loading trained agent

After saving, the app tells you that agent is trained and that you need to load it to play.
Press LOAD AGENT to load the trained agent.
This will open File explorer window where you can choose the agent 'save folder'.
Select 'Another Bot' folder and press Select Folder.
After loading, the app will instruct you to restart the game to play the loaded agent.

Playing loaded agent

Press RESET GAME and play the trained agent.

Game rules

This chapter explains the rules of engagement.

You always play first (as 'X') when new game is started.
When starting the game, opposing agent IS NOT trained, you need to load the trained agent from a file to play him.
EACH game you and agent switch signs.
Whoever plays as 'X' - plays first.
You need to get three of your signs in a row (column, row, diagonal) to win a single game.

Scoring

Score is kept.
You gain 1 point if you win, agent also gains 1 point if he wins.
Nobody gets a point when game is a 'draw'.
Score is kept only while the app runs. Scores ARE NOT saved.

Agent training

Agents are trained in the Agent training window that is accessed via TRAIN AGENT button.

There are 2 hyperparameters you can tune for agent training:

Learning strategy: Q-Learning or random.
Number of training iterations.

Learning strategies

Main learning strategy is of course now famous Q-Learning. When this strategy is selected the agent will play against itself for the given number of iterations. While training, it will fill up its Q-table with score values for each move. This table will later be used for determining the best move to play against you.

While training, one instance of Q-Learning agent always plays with 'O' and other with 'X'. In other words, when the training is done there are 2 agent, each learned to play with as a different sign. This way, we can save both of theirs Q-tables' as features of a SINGLE agent that knows how to play both as 'O' and 'X'.

Second possible strategy is agent making filling its Q-table by just making random moves and trying to learn that way.

Training iterations

Number of iterations represents the number of games the agent will plays against itself before playing you. As in other reinforcement learning bots, the meat of it's knowledge comes from the large number of played games.

We found that agent trained with Q-Learning strategy for 200.000 iterations is quite hard to beat. Feel free to train them even longer, there is no limit ;)

Saving agents

You can save the agents save folder anywhere on your disk.
As mentioned, by saving trained agent you really just save his Q-table values for 'x' and 'O' moves.
After saving, if you look in the save directory you will those 2 files, one staring with trained_O_<name of save folder> and trained_X_<name of save folder>. These contain the Q-tables.
Files are saved as pickle files - .pkl
If you don't provide a name for your save folder the app will give it the DEFAULT name. The default is a directory named save_<datetime>.

Loading agents

When loading an agent, you need to select the save FOLDER containing saved agent files, not save files themselves.
After loading you need to press the RESET GAME button to continue playing the loaded agent.

🔗 Credits

This project was greatly inspired by the following sources:

[1] How to use reinforcement learning to play tic-tac-toe by Rickard Karlsson

[2] Reinforcement Learning Tic Tac Toe Python Implementation by Marius Borcan

Thank you for doing great work!

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
pictures		pictures
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
brain.py		brain.py
game_gui.py		game_gui.py
main.py		main.py
rules.py		rules.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 TicTacToe agent based on Reinforcement leaning

🥅 Project goals

⚒️ Features

Install

📜 Docs

Train-Save-Load-Play Demo

Training the agent

Saving trained agent

Loading trained agent

Playing loaded agent

Game rules

Scoring

Agent training

Learning strategies

Training iterations

Saving agents

Loading agents

🔗 Credits

About

Contributors 2

Languages

marzekan/tictactoe-rl-agent

Folders and files

Latest commit

History

Repository files navigation

🤖 TicTacToe agent based on Reinforcement leaning

🥅 Project goals

⚒️ Features

Install

📜 Docs

Train-Save-Load-Play Demo

Training the agent

Saving trained agent

Loading trained agent

Playing loaded agent

Game rules

Scoring

Agent training

Learning strategies

Training iterations

Saving agents

Loading agents

🔗 Credits

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages