-
Notifications
You must be signed in to change notification settings - Fork 28
AI Players
All AI players can be found in package players
. If they need to evaluate states, the players may use heuristics to do so. All players extend from the players.Player
class and may implement their own constructor (keeping a default one receiving random seed and playerID is recommended, although others may be added).
The actions available to the agents are defined in utils.Types.ACTIONS
:
ACTION_STOP(0), ACTION_UP(1), ACTION_DOWN(2), ACTION_LEFT(3), ACTION_RIGHT(4), ACTION_BOMB(5);
Current AI Player options are:
HumanPlayer(KeyController, playerID)
This player can be used to play the game as a human. It receives keyboard inputs (via the KeyController
object passed) and translates them into in-game actions. Controls are either primary keys (arrows + space bar for bombs) or secondary keys (WASD + right shift for bombs). Having two sets of key options allows two players to play against 2 AI players.
DoNothingPlayer(playerID)
This player does nothing (always returns action Types.ACTIONS.ACTION_STOP
).
RandomPlayer(seed, playerID)
This player simply picks an action uniformly at random out of those available.
SimplePlayer(seed, playerID)
This agent is a rule based system that mimics the Simple Agent of the original Pommerman implementation (here). At each state, the agent uses Dijkstra's algorithm, with a maximum depth of 10, to compute distances to the different game objects in the current game state. Then, it executes the following logic:
- If there are upcoming bomb explosions, escape.
- If adjacent to an enemy, lay a bomb.
- If there is an enemy within 3 steps, move towards it.
- If there is a power-up within 2 steps, move towards it.
- If adjacent to a wooden box, lay a bomb.
- If there is a wooden box tile within 2 steps and a bomb could be laid, move towards it.
- Move randomly to a not recently visited position.
OSLAPlayer(seed, playerID)
This agent uses the Forward Model (FM) to simulate the effects of all 6 actions available from the current state. Each one of these future states is evaluated with a heuristic, which provides a numerical score for it. The action that led to the highest score is applied in the game, ties broken uniformly at random.
MCTSPlayer(seed, playerID)
MCTSPlayer(seed, playerID, MCTSParams)
MCTS is a highly selective best-first tree search method. This iterative algorithm balances between exploitation and exploration of the best moves found so far and those that require further investigation, respectively. On each iteration of the algorithm, the standard MCTS performs 4 steps:
- selection uses a 'tree policy' to navigate the tree until finding a node that does not have all its children expanded;
- expansion adds a new node to the tree;
- simulation performs moves according to a 'default policy' until a terminal node or a depth D is reached. The game state reached at the end of this rollout is evaluated using a heuristic; and
- backpropagation updates the visit counts of all traversed nodes in this iteration, as well as the accumulated reward obtained from there. At the end of the allowed budget, MCTS returns the action taken most times from the root. Count visits and average rewards are used to inform the tree policy, which normally takes the form of Upper Confidence Bounds for trees (UCB). This policy has a parameter K that balances exploration and exploitation during the selection step. The default policy used in our implementation picks actions uniformly at random for the simulation step.
In our implementation, MCTS has 3 parameters exposed:
public double K = Math.sqrt(2); public int rollout_depth = 12; public int heuristic_method = CUSTOM_HEURISTIC;
Its heuristic options are CustomHeuristic
or AdvancedHeuristic
. Its budget (stop_type
variable) can be controlled through either time (num_time
if stop_type
=STOP_TIME
), number of iterations (num_iterations
if stop_type
=STOP_ITERATIONS
), or number of calls to the GameState.next() method (num_fmcalls
if stop_type
=STOP_FMCALLS
).
RHEAPlayer(seed, playerID)
RHEAPlayer(seed, playerID, RHEAParams)
RHEA [1] [2] [3] [4] [5] is a family of algorithms that use evolution in real-time to recommend an action on each turn for the player to make. In its standard form, RHEA evolves sequences of L actions. Each sequence fitness is calculated by first using the FM to roll the state L steps ahead, and then evaluate the state reached at the end of the sequence of actions. RHEA uses a population of N individuals and the regular evolutionary operators (i.e. mutation, crossover and elitism) apply. At the end of the computational budget, RHEA returns the first action of the best sequence (the one with the highest fitness) found during evolution, to be played in the game. One of the main improvements in RHEA, is the shift buffer. Once the best individual is selected at one game tick, the first action is executed in the game and removed from the individual. Then, the rest of the action sequence is shifted, so the next individual starts with the second action of the best sequence in the previous step. A new action is created, uniformly at random, for the end of the sequence, keeping the individual length at L. This seeding mechanism permits RHEA to retain good solutions from previous game ticks. Each solution shifted is reevaluated in the following game ticks to account for inaccuracies in the opponent model or FM.
In our implementation, RHEA has 28 parameters exposed (with dependencies):
// EA structure modules settings public int genetic_operator = MUTATION_AND_CROSSOVER; public int mutation_type = MUTATION_UNIFORM; public int selection_type = SELECT_TOURNAMENT; public int crossover_type = CROSS_ONE_POINT; public int init_type = INIT_RANDOM; public boolean elitism = false; public boolean keep_parents_next_gen = true; private double mcts_budget_perc = 0.5;
// Efficiency settings public int frame_skip = 0; public int frame_skip_type = SKIP_SEQUENCE;
// EA parameters public int population_size = 1; public int individual_length = 12; public int mcts_depth = 12; public int gene_size = 1; // Should be a divisor of individual_length. A gene may contain more than 1 action. public int offspring_count = 1; public int no_elites = 1; private double tournament_size_perc = 0.4; // Percent of population that would take part in tournament selection public int mutation_gene_count = 1; public double mutation_rate = 0.3;
// Evaluation settings public int evaluate_act = EVALUATE_ACT_LAST; public int evaluate_update = EVALUATE_UPDATE_AVERAGE; public double evaluate_discount = 0.99; public int heurisic_type = CUSTOM_HEURISTIC; public boolean reevaluate_pop = true;
// Shift settings public boolean shift_buffer = true;
// MC Rollouts settings public boolean mc_rollouts = false; private double mc_rollouts_length_perc = 0.5; public int mc_rollouts_repeat = 1;
- Its heuristic options are
WinScoreHeuristic
,PlayerCountHeuristic
,CustomHeuristic
orAdvancedHeuristic
. - Its budget (
budget_type
variable) can be controlled through either time (time_budget
ifbudget_type
=TIME_BUDGET
), number of iterations (iteration_budget
ifbudget_type
=ITERATION_BUDGET
), or number of calls to the GameState.next() method (fm_budget
ifbudget_type
=FM_BUDGET
). -
genetic_operator
options are: mutation + crossover, mutation only, crossover only -
mutation_type
options are: uniform, one-bit, biased mutation (softmax) -
selection_type
options are: rank, tournament, roulette -
crossover_type
options are: uniform, 1-point, 2-point -
init_type
options are: random, OSLA, MCTS -
frame_skip_type
options are: repeat (repeats action N times), null (play action, then play null N times), random (play action, then play random N times) and sequence (play first N actions of best sequence evolved). This can be used to increase thinking time over multiple game ticks, while only returning actions to play in certain ticks controlled by theframe_skip
parameter and returning simple actions from a buffer the rest of the time to gain more thinking time. -
evaluate_act
options are: last (only use last state value), delta (use difference between last state value and root state value), average (use average of all values of states visited during rollout evaluation), minimum (use minimum of all values), maximum (use maximum of all values) and discount (use all values, discounted such that earlier rewards are valued more).
SimpleEvoAgent(seed, playerID)
This is another implementation of a simple RHEA. Its parameters are:
public boolean flipAtLeastOneValue = true; // At least one gene is mutated in the individual. public double mutProb = 0.4; // Mutation probability public int sequenceLength = 20; // Length of sequence public boolean useShiftBuffer = true; // If shift buffer should be used or not public Double discountFactor = 0.99; // Discount factor for evaluating sequences with discounted rewards. public int nEvals = 120; // Number of iterations for the algorithm (budget control)
- Home
- Pommerman Game Rules
- Py-Pommerman
- Java-Pommerman
- Docker
- Python-Java Connection