Implementation of a Blackjack playing agent

Introduction

Motivation

The goal of this project is the development of an agent capable of playing the game of Blackjack with a positive net return (i.e. "beat the house") using reinforcement learning. For this purpose, different policies are tested in two different action spaces. Furthermore, we present a novel dynamic betting algorithm that enables the policy to additionally adapt the betting amount in each round. For more information see our project report.

Methods

We use two different action spaces for our static betting (i.e. betting the same amount each round) policies:

Limited action space: Only two of the allowed actions in Blackjack are used: hit and stand
Full action space: All allowed actions of Blackjack are used: hit, stand, double, split and insurance

Limited action space

In this action space the following (static betting) policies have been tested:

Value iteration
Monte Carlo learning
Q-learning
Double Q-learning
SARSA learning
Deep Q-network

Full action space

In this action space the following (static betting) policies have been tested:

Q-learning
SARSA learning

Exploration policies

During training of the aforementioned algorithms, the following exploration policies have been used to trade-off exploration and exploitation:

Random policy
Greedy policy
Epsilon-greedy policy
Upper confidence bound (UCB) policy
Boltzmann policy

Dynamic betting

In this setting the static betting strategy (one of the policies from above) from one of the action spaces is augmented with our RL dynamic betting policy (see our project report for more detail), using (among other information) card counting.

Results

For smaller decks (which are advantageous when using card counting), our method is able to provide large positive net return, i.e. "beats the house" and also surpasses conventional methods used by professional Blackjack players. For larger decks, our method has not been able to provide positive net return yet. We believe the cause of this lies in the (relatively) poor convergence of the static betting policies when trained in the full action space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Implementation of a Blackjack playing agent

Introduction

Motivation

Methods

Limited action space

Full action space

Exploration policies

Dynamic betting

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Implementation of a Blackjack playing agent

Introduction

Motivation

Methods

Limited action space

Full action space

Exploration policies

Dynamic betting

Results