Adversarial Flow Networks (AFN)

[Paper]

PyTorch implementation of Adversarial Flow Networks (AFN). Currently supports self-play training on tic-tact-toe and Connect-4

Setup

Tested using Python 3.10.12.

pip install -r requirements.txt

Quick start

# For tic-tac-toe
python train.py -cn=ttt

❯ python train.py -cn=ttt
Generating initial buffer
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:04<00:00,  1.34it/s]
Train:  10%|████████▊                                                                               | 100/1000 [00:03<00:24, 37.17it/s]
----------  Eval  ----------

{'WINS ratio': 0.655, 'DRAWS ratio': 0.095, 'LOSSES ratio': 0.25, '% Optimal Moves': 0.6798672566371682, 'step': 100}
Saved checkpoint at checkpoints-TTT/z98hbwmw/ckpt-00000100.pt
----------  Regen  ----------
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.43it/s]
Train:  20%|█████████████████▍                                                                      | 198/1000 [00:08<00:21, 37.93it/s]
----------  Eval  ----------

{'WINS ratio': 0.695, 'DRAWS ratio': 0.065, 'LOSSES ratio': 0.24, '% Optimal Moves': 0.6884955752212389, 'step': 200}
Saved checkpoint at checkpoints-TTT/z98hbwmw/ckpt-00000200.pt
----------  Regen  ----------
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.43it/s]
Train:  30%|██████████████████████████▍                                                             | 300/1000 [00:13<00:17, 39.35it/s]
----------  Eval  ----------

{'WINS ratio': 0.89, 'DRAWS ratio': 0.06, 'LOSSES ratio': 0.05, '% Optimal Moves': 0.9165929203539823, 'step': 300}
Saved checkpoint at checkpoints-TTT/z98hbwmw/ckpt-00000300.pt
...

# For Connect-4
python train.py -cn=connect4

Generating initial buffer
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 245/245 [08:56<00:00,  2.19s/it]
Train:   2%|█▋                                                                                  | 500/25000 [14:05<11:12:13,  1.65s/it]
----------  Eval  ----------

{'WINS ratio': 0.9, 'DRAWS ratio': 0.0, 'LOSSES ratio': 0.1, '% Optimal Moves': 0.4146972596645355, 'step': 500}
Saved checkpoint at checkpoints-Connect4/cyofxiu8/ckpt-00000500.pt
----------  Regen  ----------
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:22<00:00,  2.21s/it]
Train:   4%|███▎                                                                               | 1000/25000 [28:43<11:32:20,  1.73s/it]
----------  Eval  ----------

{'WINS ratio': 0.97, 'DRAWS ratio': 0.0, 'LOSSES ratio': 0.03, '% Optimal Moves': 0.53759765625, 'step': 1000}
Saved checkpoint at checkpoints-Connect4/cyofxiu8/ckpt-00001000.pt
----------  Regen  ----------
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:21<00:00,  2.16s/it]
Train:   6%|████▊                                                                              | 1460/25000 [42:08<10:55:04,  1.67s/it]

Config

Training configurations can be found in configs/. See ttt.yaml for an annotated version of all the options.

To enable wandb tracking, set mode: enabled under wandb.

Playing against trained agents

After training an agent, it can be played against as follows:

python play_against_ai.py /path/to/ckpt.pt
python play_against_ai.py /path/to/ckpt.pt --human-first # To play first

Citing

If you find this repository useful, please consider citing it:

@misc{jiralerspong2023expected,
      title={Expected flow networks in stochastic environments and two-player zero-sum games},
      author={Marco Jiralerspong and Bilun Sun and Danilo Vucetic and Tianyu Zhang and Yoshua Bengio and Gauthier Gidel and Nikolay Malkin},
      year={2023},
      eprint={2310.02779},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Adversarial Flow Networks (AFN)

Setup

Quick start

Config

Playing against trained agents

Citing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Adversarial Flow Networks (AFN)

Setup

Quick start

Config

Playing against trained agents

Citing