Skip to content

Releases: tambetm/pommerman-baselines

Models trained on Single sample per episode 600K dataset

16 Jul 22:27
Compare
Choose a tag to compare

These are pre-trained models on Single sample per episode 600K dataset.

Single sample per episode 600K dataset

16 Jul 22:18
Compare
Choose a tag to compare

This dataset contains 600K observations, actions and state values recorded using one-sample-per-episode scheme. This increases dataset diversity and allows to successfully learn value function. Because we ran four SimpleAgents against each other, the dataset is actually collected from 150K different episodes - from each episode we used random sample from each of the four agents. There are two dataset versions: one with discount rate 0.9 and one with 0.99.

Models trained on SimpleAgent 600K dataset

20 Jun 18:47
Compare
Choose a tag to compare

These are pre-trained models on SimpleAgent 600K dataset.

SimpleAgent 600K dataset

20 Jun 18:39
Compare
Choose a tag to compare

Samples collected from four SimpleAgents playing against each other. Dataset contains 600 episodes (~600K samples) in training set and 100 episodes (~100K samples) in validation set. There are two versions: rewards calculated with 0.99 discount and no discount (1). Cleaned version means that if three consecutive actions and four consecutive observations did not change, those samples are removed.