Skip to content

AI Players

Raluca D. Gaina edited this page Sep 24, 2019 · 2 revisions

All AI players can be found in package players. If they need to evaluate states, the players may use heuristics to do so. All players extend from the players.Player class and may implement their own constructor (keeping a default one receiving random seed and playerID is recommended, although others may be added).

The actions available to the agents are defined in utils.Types.ACTIONS:

    ACTION_STOP(0),
    ACTION_UP(1),
    ACTION_DOWN(2),
    ACTION_LEFT(3),
    ACTION_RIGHT(4),
    ACTION_BOMB(5);

Current AI Player options are:

Human Player

HumanPlayer(KeyController, playerID)

This player can be used to play the game as a human. It receives keyboard inputs (via the KeyController object passed) and translates them into in-game actions. Controls are either primary keys (arrows + space bar for bombs) or secondary keys (WASD + right shift for bombs). Having two sets of key options allows two players to play against 2 AI players.

Do Nothing

DoNothingPlayer(playerID)

This player does nothing (always returns action Types.ACTIONS.ACTION_STOP).

Random

RandomPlayer(seed, playerID)

This player simply picks an action uniformly at random out of those available.

Rule-based

SimplePlayer(seed, playerID)

This agent is a rule based system that mimics the Simple Agent of the original Pommerman implementation (here). At each state, the agent uses Dijkstra's algorithm, with a maximum depth of 10, to compute distances to the different game objects in the current game state. Then, it executes the following logic:

  • If there are upcoming bomb explosions, escape.
  • If adjacent to an enemy, lay a bomb.
  • If there is an enemy within 3 steps, move towards it.
  • If there is a power-up within 2 steps, move towards it.
  • If adjacent to a wooden box, lay a bomb.
  • If there is a wooden box tile within 2 steps and a bomb could be laid, move towards it.
  • Move randomly to a not recently visited position.

One Step Look Ahead

OSLAPlayer(seed, playerID)

This agent uses the Forward Model (FM) to simulate the effects of all 6 actions available from the current state. Each one of these future states is evaluated with a heuristic, which provides a numerical score for it. The action that led to the highest score is applied in the game, ties broken uniformly at random.

Monte Carlo Tree Search

MCTSPlayer(seed, playerID)
MCTSPlayer(seed, playerID, MCTSParams)

MCTS is a highly selective best-first tree search method. This iterative algorithm balances between exploitation and exploration of the best moves found so far and those that require further investigation, respectively. On each iteration of the algorithm, the standard MCTS performs 4 steps:

  1. selection uses a 'tree policy' to navigate the tree until finding a node that does not have all its children expanded;
  2. expansion adds a new node to the tree;
  3. simulation performs moves according to a 'default policy' until a terminal node or a depth D is reached. The game state reached at the end of this rollout is evaluated using a heuristic; and
  4. backpropagation updates the visit counts of all traversed nodes in this iteration, as well as the accumulated reward obtained from there. At the end of the allowed budget, MCTS returns the action taken most times from the root. Count visits and average rewards are used to inform the tree policy, which normally takes the form of Upper Confidence Bounds for trees (UCB). This policy has a parameter K that balances exploration and exploitation during the selection step. The default policy used in our implementation picks actions uniformly at random for the simulation step.

MCTSParams

In our implementation, MCTS has 3 parameters exposed:

public double K = Math.sqrt(2);
public int rollout_depth = 12;
public int heuristic_method = CUSTOM_HEURISTIC;

Its heuristic options are CustomHeuristic or AdvancedHeuristic. Its budget (stop_type variable) can be controlled through either time (num_time if stop_type=STOP_TIME), number of iterations (num_iterations if stop_type=STOP_ITERATIONS), or number of calls to the GameState.next() method (num_fmcalls if stop_type=STOP_FMCALLS).

Rolling Horizon Evolutionary Algorithms

RHEAPlayer(seed, playerID)
RHEAPlayer(seed, playerID, RHEAParams)

RHEA [1] [2] [3] [4] [5] is a family of algorithms that use evolution in real-time to recommend an action on each turn for the player to make. In its standard form, RHEA evolves sequences of L actions. Each sequence fitness is calculated by first using the FM to roll the state L steps ahead, and then evaluate the state reached at the end of the sequence of actions. RHEA uses a population of N individuals and the regular evolutionary operators (i.e. mutation, crossover and elitism) apply. At the end of the computational budget, RHEA returns the first action of the best sequence (the one with the highest fitness) found during evolution, to be played in the game. One of the main improvements in RHEA, is the shift buffer. Once the best individual is selected at one game tick, the first action is executed in the game and removed from the individual. Then, the rest of the action sequence is shifted, so the next individual starts with the second action of the best sequence in the previous step. A new action is created, uniformly at random, for the end of the sequence, keeping the individual length at L. This seeding mechanism permits RHEA to retain good solutions from previous game ticks. Each solution shifted is reevaluated in the following game ticks to account for inaccuracies in the opponent model or FM.

RHEAParams

In our implementation, RHEA has 28 parameters exposed (with dependencies):

// EA structure modules settings
public int genetic_operator = MUTATION_AND_CROSSOVER;
public int mutation_type = MUTATION_UNIFORM;
public int selection_type = SELECT_TOURNAMENT;
public int crossover_type = CROSS_ONE_POINT;
public int init_type = INIT_RANDOM;
public boolean elitism = false;
public boolean keep_parents_next_gen = true;
private double mcts_budget_perc = 0.5;
// Efficiency settings
public int frame_skip = 0;
public int frame_skip_type = SKIP_SEQUENCE;
// EA parameters
public int population_size = 1;
public int individual_length = 12;
public int mcts_depth = 12;
public int gene_size = 1;  // Should be a divisor of individual_length. A gene may contain more than 1 action.
public int offspring_count = 1;
public int no_elites = 1;
private double tournament_size_perc = 0.4;  // Percent of population that would take part in tournament selection
public int mutation_gene_count = 1;
public double mutation_rate = 0.3;
// Evaluation settings
public int evaluate_act = EVALUATE_ACT_LAST;
public int evaluate_update = EVALUATE_UPDATE_AVERAGE;
public double evaluate_discount = 0.99;
public int heurisic_type = CUSTOM_HEURISTIC;
public boolean reevaluate_pop = true;
// Shift settings
public boolean shift_buffer = true;
// MC Rollouts settings
public boolean mc_rollouts = false;
private double mc_rollouts_length_perc = 0.5;
public int mc_rollouts_repeat = 1;
  • Its heuristic options are WinScoreHeuristic, PlayerCountHeuristic, CustomHeuristic or AdvancedHeuristic.
  • Its budget (budget_type variable) can be controlled through either time (time_budget if budget_type=TIME_BUDGET), number of iterations (iteration_budget if budget_type=ITERATION_BUDGET), or number of calls to the GameState.next() method (fm_budget if budget_type=FM_BUDGET).
  • genetic_operator options are: mutation + crossover, mutation only, crossover only
  • mutation_type options are: uniform, one-bit, biased mutation (softmax)
  • selection_type options are: rank, tournament, roulette
  • crossover_type options are: uniform, 1-point, 2-point
  • init_type options are: random, OSLA, MCTS
  • frame_skip_type options are: repeat (repeats action N times), null (play action, then play null N times), random (play action, then play random N times) and sequence (play first N actions of best sequence evolved). This can be used to increase thinking time over multiple game ticks, while only returning actions to play in certain ticks controlled by the frame_skip parameter and returning simple actions from a buffer the rest of the time to gain more thinking time.
  • evaluate_act options are: last (only use last state value), delta (use difference between last state value and root state value), average (use average of all values of states visited during rollout evaluation), minimum (use minimum of all values), maximum (use maximum of all values) and discount (use all values, discounted such that earlier rewards are valued more).

SimpleEvoAgent

SimpleEvoAgent(seed, playerID)

This is another implementation of a simple RHEA. Its parameters are:

public boolean flipAtLeastOneValue = true;  // At least one gene is mutated in the individual.
public double mutProb = 0.4;  // Mutation probability
public int sequenceLength = 20;  // Length of sequence
public boolean useShiftBuffer = true;  // If shift buffer should be used or not
public Double discountFactor = 0.99;  // Discount factor for evaluating sequences with discounted rewards.
public int nEvals = 120;  // Number of iterations for the algorithm (budget control)