Render history #149

egafni · 2021-04-15T20:15:33Z

Provides a way to store the rendered history for visualization in jupyter notebooks. If this looks reasonable to you i can clean it up for other games

Add Gomoku Game

Add Tic-Tac-Toe game

It seems the support variable is a long not a float

Convert support variable to float

Co-authored-by: ahainaut <[email protected]>

Add PER support

update PER support

IS weights for prioritized replay

…play Prioritized replay and fix target value

Co-authored-by: ahainaut <[email protected]>

* Update trainer.py

…ner-duvaud#104) Updated strings passted to SummaryWriter.add_scalar removing spaces and replacing with underscores. Fixes werner-duvaud#92.

…d#105) Added a small sub-routine that scans for previously saved models. These can be loaded by simply selecting the menu option instead of providing the full path. The menu is presented in reverse chronological order and the final option "Specify paths manually" preserves the ability to provide a full path for the checkpoint and replay buffer as was done previously.

because player[i-1] got reward[i] in GameHistory

…sign_in_compute_target_value fix reward sign in compute_target_value()

…ple_n_games_at_one_time_in_get_batch sample N games at one time in replay_buffer

werner-duvaud and others added 30 commits February 22, 2020 23:36

Merge pull request werner-duvaud#9 from littleV/gomoku

6534061

Add Gomoku Game

Add hyperparameters to TensorBoard and update menu

cd3db2f

Update readme

a06b9fb

Added Tic-Tac-Toe game.

a850762

Merge pull request werner-duvaud#14 from fidel-schaposnik/tictactoe

697435e

Add Tic-Tac-Toe game

Improve ResNet pooling and add abstract game

1454f56

Add test again random agent

9f3f25f

Add parameters for board games

7d3acbc

Add random agent results to tensorboard

18be118

Convergence of tic-tac-toe with fully connected network

b16e4ae

Type error

4aad737

It seems the support variable is a long not a float

Merge pull request werner-duvaud#17 from manuel-delverme/patch-1

0d110d0

Convert support variable to float

Add stack action to stacked observations

f2f0f6f

Fix MCTS

1442e20

Fix MCTS and typo

1864750

Co-authored-by: ahainaut <[email protected]>

Fix OverflowError and Add Conv 1x1

c16e2ec

Improve lunarlander hyperparameters and scale encoded state gradients

7b861c4

add PER support

10b7fda

add PER support

58af304

Improve cartpole hyperparameters and fix typo

f51d2cb

Merge pull request werner-duvaud#23 from xuxiyang1993/master

976aa01

Add PER support

update PER support

5c2e4d4

Merge pull request werner-duvaud#25 from xuxiyang1993/master

bdbf703

update PER support

Improve loss scaling and fix MCTS

027ecc5

IS weights for prioritized replay

43688f7

Merge pull request werner-duvaud#29 from xuxiyang1993/master

4a20f90

IS weights for prioritized replay

Change batch aggregation, fix value in replay buffer and prepare merge

b62bcce

Merge branch 'master' into prioritized_replay

2d02fc3

Merge pull request werner-duvaud#30 from werner-duvaud/prioritized_re…

5c61323

…play Prioritized replay and fix target value

Add notebook and fix merge

ee1b333

werner-duvaud and others added 30 commits August 14, 2020 21:53

Improve CPU/GPU management

4c4422f

Update replay_buffer.py

710a33f

Add resume training and improve training exit

e46b500

Co-authored-by: ahainaut <[email protected]>

Fix string formatting (werner-duvaud#77)

4cdcae0

* Update trainer.py

Improve docstring and fix load replay buffer werner-duvaud#75

ffbd4b3

Fix tic tac toe action to string

cd2c7a4

Fix reanalyse and format

2600e3e

Add lunarlander checkpoint

b7a7665

Renames window_size to replay_buffer_size (werner-duvaud#83)

ab3ffe6

Update TicTacToe configuration

2bc3534

Fix badge typo (werner-duvaud#85)

4b500ff

Fix Pytorch 1.7

f752af6

Parallelize get_batch in trainer (werner-duvaud#86)

ca40525

Fix GPU availability in actors with PyTorch 1.7

1583ad9

Fix werner-duvaud#89 (residual block)

f350a08

Update README.md

8fba3da

Fix reanalyse on GPU

dccac8a

Typo fix werner-duvaud#100

bdafa68

Explicitly remove special lunarlander import fix werner-duvaud#106

469c327

Change spaces to underscores to fix SummaryWriter error messages (wer…

a4bf9a7

…ner-duvaud#104) Updated strings passted to SummaryWriter.add_scalar removing spaces and replacing with underscores. Fixes werner-duvaud#92.

Fix gomoku hyperparameters and format

fb9c6ca

fix reward sign in compute_target_value()

1e5c3d0

because player[i-1] got reward[i] in GameHistory

sample N games at one time in replay_buffer

cf320f0

Merge pull request werner-duvaud#108 from mokemokechicken/fix_reward_…

e864f59

…sign_in_compute_target_value fix reward sign in compute_target_value()

Update comments

14fe80d

Fix werner-duvaud#114

a3e899f

Merge pull request werner-duvaud#117 from mokemokechicken/feature/sam…

2f3e3cb

…ple_n_games_at_one_time_in_get_batch sample N games at one time in replay_buffer

Merge branch 'master' of github.com:egafni/muzero-general

342e785

render hist

3592fc9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render history #149

Render history #149

egafni commented Apr 15, 2021 •

edited

Loading

Render history #149

Are you sure you want to change the base?

Render history #149

Conversation

egafni commented Apr 15, 2021 • edited Loading

egafni commented Apr 15, 2021 •

edited

Loading