Toebot

Toebot is an AI for palying ultimate tic tac toe game in the theaigames.com competition. This bot won first place using a mix of traditional mini max search and reinforcement learning.

Progress

Varios notes about the progress of the bot development. Not necessarily very useful.

v35

Trying out te heuristic where we add the value of the current board multiplied by 0.3.

Base(v34):279 Test(candidate-3):346 Draws:375 Total:1000 Score:53.3±2.4% ScoreNoDraw:55.4±3.9% Rating:+23
Base(v34):342 Test(candidate-4):263 Draws:386 Total:991 Score:46.0±2.4% ScoreNoDraw:43.5±4.0% Rating:-28
Base(v34):442 Test(candidate-2):394 Draws:164 Total:1000 Score:47.6±2.8% ScoreNoDraw:47.1±3.4% Rating:-17

v34

Enable opening table by default.

Base(v33):265 Test(candidate):634 Draws:101 Total:1000 Score:68.5±2.7% ScoreNoDraw:70.5±3.0% Rating:+135

v33

Fix pondering

v31

Experimenting with pondering and sorting:

Base(v30):86 Test(sorting):74 Draws:130 Total:290 Score:47.9±4.3% ScoreNoDraw:46.2±7.7% Rating:-14
Base(v30):74 Test(pondering):104 Draws:114 Total:292 Score:55.1±4.4% ScoreNoDraw:58.4±7.2% Rating:+36
Base(v30):121 Test(pondering-sorting):102 Draws:68 Total:291 Score:46.7±5.0% ScoreNoDraw:45.7±6.5% Rating:-23

Experimenting with conservative hash:

Base(toebot-aggressive):252 Test(toebot-pondering-sorting-conservative):496 Draws:252 Total:1000 Score:62.2±2.6% ScoreNoDraw:66.3±3.4% Rating:+87

Base(v30):301 Test(v31):433 Draws:266 Total:1000 Score:56.6±2.6% ScoreNoDraw:59.0±3.6% Rating:+46
Base(v26):143 Test(v31):487 Draws:370 Total:1000 Score:67.2±2.2% ScoreNoDraw:77.3±3.3% Rating:+125
Base(v31):201 Test(candidate):352 Draws:447 Total:1000 Score:57.6±2.3% ScoreNoDraw:63.7±4.0% Rating:+53
Base(v30):295 Test(candidate):375 Draws:330 Total:1000 Score:54.0±2.5% ScoreNoDraw:56.0±3.8% Rating:+28
Base(v26):141 Test(candidate):504 Draws:355 Total:1000 Score:68.2±2.2% ScoreNoDraw:78.1±3.2% Rating:+132

Pinned both bots to the same cpu:

Base(v24):45 Test(candidate):605 Draws:350 Total:1000 Score:78.0±1.8% ScoreNoDraw:93.1±2.0% Rating:+220
Base(v25):220 Test(candidate):587 Draws:193 Total:1000 Score:68.3±2.5% ScoreNoDraw:72.7±3.1% Rating:+134
Base(v26):204 Test(candidate):670 Draws:126 Total:1000 Score:73.3±2.5% ScoreNoDraw:76.7±2.8% Rating:+175
Base(v30):204 Test(candidate):413 Draws:383 Total:1000 Score:60.5±2.3% ScoreNoDraw:66.9±3.7% Rating:+74
Base(v31):112 Test(candidate):534 Draws:354 Total:1000 Score:71.1±2.1% ScoreNoDraw:82.7±2.9% Rating:+156

v27 Candidate

I decided to explore pondering a little more, this is what I get comparing pondering vs pondering with effectively one cpu per game:

Base(./toebot):180 Test(./toebot --ponder):409 Draws:411 Total:1000 Score:61.5±2.3% Rating:+81

Which is a notable improvement.

Next I need to run the same experiment but with two cpus per game. So far it looks like the bot perform worse

Now, moving on to use some RL, here I add one feature for each cell board on top of the current heuristic as a single features. And perform a linear combination of all features. I trained the weights through RL and obtained the following results after about 2 days of training:

Base(./toebot-v24):77 Test(./toebot-candidate):576 Draws:347 Total:1000 Score:75.0±2.0% Rating:+190
Base(./toebot-v25):165 Test(./toebot-candidate):613 Draws:222 Total:1000 Score:72.4±2.4% Rating:+168
Base(./toebot-v26):192 Test(./toebot-candidate):309 Draws:499 Total:1000 Score:55.9±2.2% Rating:+41

it was a small improvement over v26, but it doesn't perform as well against v25. maybe v26 was exploiting some of v25 week points. I should tune the depth shortening heuristic a bit after this.

Base(./toebot-v27):370 Test(./toebot-v27 --shortening-threshold 4000):198 Draws:432 Total:1000 Score:41.4±2.3% Rating:-60
Base(./toebot-v27):504 Test(./toebot-v27 --shortening-threshold 3000):348 Draws:148 Total:1000 Score:42.2±2.8% Rating:-55
Base(./toebot-v27):406 Test(./toebot-v27 --shortening-threshold 6000):184 Draws:410 Total:1000 Score:38.9±2.3% Rating:-78
Base(./toebot-v27):83 Test(./toebot-v27 --shortening-threshold 7000):175 Draws:742 Total:1000 Score:54.6±1.5% Rating:+32

Base(./toebot-v27):15 Test(./toebot-v27 --shortening-threshold 8000):14 Draws:44 Total:73 Score:49.3±7.2% ScoreNoDraw:48.3±0.2% Rating:-5
Base(./toebot-v27):35 Test(./toebot-v27 --shortening-threshold 9000):19 Draws:19 Total:73 Score:39.0±9.5% ScoreNoDraw:35.2±0.1% Rating:-77
Base(./toebot-v26):34 Test(./toebot-v27 --shortening-threshold 7000):16 Draws:22 Total:72 Score:37.5±9.2% ScoreNoDraw:32.0±0.1% Rating:-89
Base(./toebot-v25):21 Test(./toebot-v27 --shortening-threshold 7000):31 Draws:22 Total:74 Score:56.8±9.4% ScoreNoDraw:59.6±0.1% Rating:+47
Base(./toebot-v24):14 Test(./toebot-v27 --shortening-threshold 7000):49 Draws:10 Total:73 Score:74.0±9.1% ScoreNoDraw:77.8±0.1% Rating:+181

Base(v26):98 Test(v27-40000):194 Draws:216 Total:508 Score:59.4±3.2% ScoreNoDraw:66.4±5.4% Rating:+66
Base(v26):89 Test(v27-30000):168 Draws:251 Total:508 Score:57.8±3.0% ScoreNoDraw:65.4±5.8% Rating:+54
Base(v26):109 Test(v27-50000):152 Draws:247 Total:508 Score:54.2±3.1% ScoreNoDraw:58.2±6.0% Rating:+29
Base(v25):119 Test(v27-40000):205 Draws:184 Total:508 Score:58.5±3.4% ScoreNoDraw:63.3±5.2% Rating:+59
Base(v25):134 Test(v27-30000):180 Draws:194 Total:508 Score:54.5±3.4% ScoreNoDraw:57.3±5.5% Rating:+32
Base(v25):83 Test(v27-50000):293 Draws:132 Total:508 Score:70.7±3.3% ScoreNoDraw:77.9±4.2% Rating:+153
Base(v24):60 Test(v27-40000):353 Draws:95 Total:508 Score:78.8±3.0% ScoreNoDraw:85.5±3.4% Rating:+228
Base(v24):56 Test(v27-30000):347 Draws:105 Total:508 Score:78.6±3.0% ScoreNoDraw:86.1±3.4% Rating:+226
Base(v24):30 Test(v27-50000):292 Draws:185 Total:507 Score:75.8±2.6% ScoreNoDraw:90.7±3.2% Rating:+199

Looks like v27 is the winner.

v26 candidate

base(./toebot-v25):195 test(./toebot-v26):684 draws:1121 total:2000 ratio:77.82±2.75%

base(./toebot-v24):263 test(./toebot-v26):1212 draws:525 total:2000 ratio:82.17±1.95%

base(./toebot-v23):542 test(./toebot-v26):753 draws:705 total:2000 ratio:58.15±2.69%

experiment with shortening threshold:

base(./toebot-v23):236 test(./toebot-v26):692 draws:72 total:1000 ratio:74.57±2.80%
base(./toebot-v23):153 test(./toebot-v26 --shortening-threshold 30000 --depth-shortening 4):734 draws:113 total:1000 ratio:82.75±2.49%
base(./toebot-v23):107 test(./toebot-v26 --shortening-threshold 40000 --depth-shortening 4):639 draws:254 total:1000 ratio:85.66±2.52%
base(./toebot-v26):154 test(./toebot-v26 --shortening-threshold 30000 --depth-shortening 4):322 draws:524 total:1000 ratio:67.65±4.20%
base(./toebot-v26):6 test(./toebot-v26 --shortening-threshold 40000 --depth-shortening 4):440 draws:554 total:1000 ratio:98.65±1.07%
base(./toebot-v26 --shortening-threshold 30000 --depth-shortening 4):159 test(./toebot-v26 --shortening-threshold 40000 --depth-shortening 4):381 draws:460 total:1000 ratio:70.56±3.84%

experiment with depth shortening:

base(./toebot-v23):34 test(./toebot-v26):147 draws:51 total:232 ratio:81.22±5.69%
base(./toebot-v23):20 test(./toebot-v26 --depth-shortening 6):175 draws:37 total:232 ratio:89.74±4.26%
base(./toebot-v23):20 test(./toebot-v26 --depth-shortening 8):170 draws:42 total:232 ratio:89.47±4.36%
base(./toebot-v26):58 test(./toebot-v26 --depth-shortening 6):43 draws:130 total:231 ratio:42.57±9.64%
base(./toebot-v26):98 test(./toebot-v26 --depth-shortening 8):39 draws:94 total:231 ratio:28.47±7.56%
base(./toebot-v26 --depth-shortening 6):44 test(./toebot-v26 --depth-shortening 8):46 draws:142 total:232 ratio:51.11±10.33%

experiment with shortening threshold no 2:

base(./toebot-v23):105 test(./toebot-v26 --shortening-threshold 40000):663 draws:232 total:1000 ratio:86.33±2.43%
base(./toebot-v23):59 test(./toebot-v26 --shortening-threshold 50000):807 draws:134 total:1000 ratio:93.19±1.68%
base(./toebot-v23):43 test(./toebot-v26 --shortening-threshold 60000):859 draws:98 total:1000 ratio:95.23±1.39%
base(./toebot-v26 --shortening-threshold 40000):65 test(./toebot-v26 --shortening-threshold 50000):237 draws:698 total:1000 ratio:78.48±4.64%
base(./toebot-v26 --shortening-threshold 40000):217 test(./toebot-v26 --shortening-threshold 60000):149 draws:634 total:1000 ratio:40.71±5.03%
base(./toebot-v26 --shortening-threshold 50000):201 test(./toebot-v26 --shortening-threshold 60000):79 draws:720 total:1000 ratio:28.21±5.27%

partial result:

base(./toebot-v23):43 test(./toebot-v24):97 draws:97 total:237 ratio:69.29±7.64%
base(./toebot-v23):36 test(./toebot-v25):161 draws:39 total:236 ratio:81.73±5.40%
base(./toebot-v23):11 test(./toebot-v26):188 draws:38 total:237 ratio:94.47±3.18%
base(./toebot-v24):91 test(./toebot-v25):115 draws:30 total:236 ratio:55.83±6.78%
base(./toebot-v24):35 test(./toebot-v26):131 draws:70 total:236 ratio:78.92±6.21%
base(./toebot-v25):26 test(./toebot-v26):109 draws:101 total:236 ratio:80.74±6.65%

a partial experiment comparing v26 with v25, strangely, v26 with depth-shortening=0 should be equivalent to v25, but they seem to be fairly different:

base(./toebot-v25):78 test(./toebot-v26 --depth-shortening 0):40 draws:129 total:247 ratio:33.90±8.54%
base(./toebot-v25):15 test(./toebot-v26 --shortening-threshold 20000 --depth-shortening 4):94 draws:138 total:247 ratio:86.24±6.47%
base(./toebot-v25):1 test(./toebot-v26 --shortening-threshold 40000 --depth-shortening 4):126 draws:120 total:247 ratio:99.21±1.54%
base(./toebot-v25):13 test(./toebot-v26 --shortening-threshold 50000 --depth-shortening 4):113 draws:122 total:248 ratio:89.68±5.31%
base(./toebot-v26 --depth-shortening 0):21 test(./toebot-v26 --shortening-threshold 20000 --depth-shortening 4):87 draws:139 total:247 ratio:80.56±7.46%
base(./toebot-v26 --depth-shortening 0):1 test(./toebot-v26 --shortening-threshold 40000 --depth-shortening 4):101 draws:145 total:247 ratio:99.02±1.91%
base(./toebot-v26 --depth-shortening 0):2 test(./toebot-v26 --shortening-threshold 50000 --depth-shortening 4):124 draws:122 total:248 ratio:98.41±2.18%
base(./toebot-v26 --shortening-threshold 20000 --depth-shortening 4):54 test(./toebot-v26 --shortening-threshold 40000 --depth-shortening 4):80 draws:113 total:247 ratio:59.70±8.31%
base(./toebot-v26 --shortening-threshold 20000 --depth-shortening 4):62 test(./toebot-v26 --shortening-threshold 50000 --depth-shortening 4):91 draws:93 total:246 ratio:59.48±7.78%
base(./toebot-v26 --shortening-threshold 40000 --depth-shortening 4):19 test(./toebot-v26 --shortening-threshold 50000 --depth-shortening 4):65 draws:163 total:247 ratio:77.38±8.95%

final experiment:

base(./toebot-v23):199 test(./toebot-v24):413 draws:388 total:1000 ratio:67.48±3.71%
base(./toebot-v23):171 test(./toebot-v25):641 draws:188 total:1000 ratio:78.94±2.80%
base(./toebot-v23):60 test(./toebot-v26):799 draws:141 total:1000 ratio:93.02±1.70%
base(./toebot-v24):368 test(./toebot-v25):486 draws:146 total:1000 ratio:56.91±3.32%
base(./toebot-v24):185 test(./toebot-v26):558 draws:257 total:1000 ratio:75.10±3.11%
base(./toebot-v25):45 test(./toebot-v26):471 draws:484 total:1000 ratio:91.28±2.43%

from this experiment, v26 should have +158 more elo points than v25.

time experiment:

base(./toebot-v26 --default-time-limit 100):174 test(./toebot-v26 --default-time-limit 250):379 draws:351 total:904 score:61.3±2.4% rating:+80
base(./toebot-v26 --default-time-limit 100):85 test(./toebot-v26 --default-time-limit 500):341 draws:476 total:902 score:64.2±2.0% rating:+101
base(./toebot-v26 --default-time-limit 100):45 test(./toebot-v26 --default-time-limit 800):497 draws:361 total:903 score:75.0±1.9% rating:+191
base(./toebot-v26 --default-time-limit 250):311 test(./toebot-v26 --default-time-limit 500):234 draws:359 total:904 score:45.7±2.5% rating:-30
base(./toebot-v26 --default-time-limit 250):82 test(./toebot-v26 --default-time-limit 800):412 draws:408 total:902 score:68.3±2.1% rating:+133
base(./toebot-v26 --default-time-limit 500):71 test(./toebot-v26 --default-time-limit 800):158 draws:675 total:904 score:54.8±1.6% rating:+34

suprprisingly, the one with 500ms was beaten by the one with 250ms.

v25

base(./toebot-v24):676 test(./toebot-v25):972 draws:352 total:2000 ratio:58.98±2.37%

base(./toebot-v23):376 test(./toebot-v25):1255 draws:369 total:2000 ratio:76.95±2.04%

v24

base(./toebot-v23):377 test(./toebot-v24):834 draws:789 total:2000 ratio:68.87±2.61%

Benchmark

./bench.sh
Player 1 won
Rounds: 53 Total Nodes: 75,574,130
Command took 8238.84 milliseconds.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
learning		learning
samples		samples
spider		spider
third_party		third_party
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
ai.cpp		ai.cpp
ai.h		ai.h
bench.sh		bench.sh
board.cpp		board.cpp
board.h		board.h
board_test.cpp		board_test.cpp
cmd_args.cpp		cmd_args.cpp
cmd_args.h		cmd_args.h
eval.cfg		eval.cfg
eval.sh		eval.sh
flags.cpp		flags.cpp
flags.h		flags.h
generated_opening_table.cpp		generated_opening_table.cpp
generated_opening_table.h		generated_opening_table.h
hash.cpp		hash.cpp
hash.h		hash.h
hash_table.cpp		hash_table.cpp
hash_table.h		hash_table.h
interruption.cpp		interruption.cpp
interruption.h		interruption.h
line_reader.cpp		line_reader.cpp
line_reader.h		line_reader.h
main.cpp		main.cpp
main.h		main.h
model.cpp		model.cpp
notes.md		notes.md
opening_table.cpp		opening_table.cpp
opening_table.h		opening_table.h
pack.sh		pack.sh
random.cpp		random.cpp
random.h		random.h
score_table.cpp		score_table.cpp
score_table.h		score_table.h
search_tree_printer.cpp		search_tree_printer.cpp
search_tree_printer.h		search_tree_printer.h
self_play.golden		self_play.golden
state.h		state.h
test.cpp		test.cpp
train.sh		train.sh
util.cpp		util.cpp
util.h		util.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toebot

Progress

Benchmark

About

Releases

Packages

Languages

rahenri/toebot

Folders and files

Latest commit

History

Repository files navigation

Toebot

Progress

Benchmark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages