Skip to content
This repository has been archived by the owner on Mar 17, 2020. It is now read-only.

Commit

Permalink
ReadMe
Browse files Browse the repository at this point in the history
Made a proper ReadMe file to introduce new users to the environment.
Also removed unused files.
Updated jar.
  • Loading branch information
BlueDi committed Jul 22, 2019
1 parent 23def91 commit d205bb0
Show file tree
Hide file tree
Showing 9 changed files with 113 additions and 148 deletions.
99 changes: 98 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,99 @@
# DeepDip
DQN agent that plays Diplomacy in BANDANA
**DeepDip** is an agent designed to play **no-press Diplomacy** using **Deep Reinforcement Learning** to decide its Orders.

## gym-diplomacy
This repository offers a **new OpenAI Gym environment** to play **no-press Diplomacy**.
* It uses the **BANDANA** framework to set up the rules of the game.
* It uses the architecture of **OpenAI Gym** to make it standard and easily recognizable to any developer.
* It is compatible with the existing agents from **Baselines**.

## Variants
This environment provides three different variants.
* **Standard** - the original Diplomacy map for seven players.
* **Three** - a three-player variant to study how the agent behaves in a multiplayer game.
* **Small** - a two-player variant to understand how the environment works and study the performance of the agent.

## Setup
### Dependencies
Java JDK, Python 2, Python 3, Pipenv.

### Setup Parlance
**Parlance**, the **BANDANA's game engine**, is written in **Python 2**.
The repository provides a **custom version of Parlance** that has additional variants.
The following script install **Parlance** using the version provided in this repository.
```bash
pip2 install -e parlance
```

### Setup BANDANA
**BANDANA** is written in **Java**.
As such, its agents and game engine need to be compiled using **Java JDK**.
It is also recommended to use **Maven** to compile using the provided `bandana/pom.xml` file.
Using the following command will compile the **Java** files.
```bash
mvn -f bandana clean install
```

### Setup Python packages
This project uses **Pipenv** to manage its **Python 3** packages.
First, setup the **new Gym environment** by adding **gym-diplomacy** to **Baselines** setup using the following script.
```bash
echo 'import gym-diplomacy\n' > temp__init__.py
cat __init__.py >> temp__init__.py
mv temp__init__.py __init__.py
```
To install all the necessary packages, you need to set up properly **Pipenv** and run the following command to install the packages of `Pipfile`.
```bash
pipenv install
```

### Setup Protobuf
**Protobuf** establishes the *communication* between **BANDANA's Java** and **Gym's Python**.
You can set it up using the following script.
```bash
cd protobuf
make
cd ..
```

## Usage
### Run the agent
The following script will start the training process of the agent.
```bash
cd deepdip
pipenv run python deepdip_stable-baselines.py
```
It will create a `deepdip-results` folder.
* In `pickles`, it will store the *most recent* and the *best* model.
* The other folders with the name of the algorithm (*PPO2*) will store the **TensorBoard** files.
* The `monitor.csv` files are the records of the training process of the Gym environment.

DeepDip uses the *most recent* model to train, and the *best* model to evaluate itself.
The trained model can be used by changing the `deepdip-results` folder to one of the provided.

At the end of the `deepdip_stable-baselines.py` script, it is generated a graph with the evolution of the rewards.

### Choose Variant Map
The code is using the **Three** variant by default.

To change the variant in use, two alterations on the code must be done.
1. On the **Java**, in the file `TournamentRunner.java`, change the variable `GAME_MAP` to 'standard', or 'small', or 'three'.
2. On the **Python**, in the file `diplomacy_strategy_env.py`, change the variable `CURRENT_MAP` to 'standard', or 'small', or 'three', or to the desired position of the variable `MAPS`.

After these changes, rerun **Maven** to apply the alterations.

## Citation
This work was **my Master Thesis**.

To cite this work in publications:
```
@article{cruz_strategicdiplomacy_2019,
author = {Cruz, Diogo and {Lopes Cardoso}, Henrique},
title = {{{Deep Reinforcement Learning}} in {{Strategic Multi-Agent Games}}: the case of No-Press {{Diplomacy}}},
shorttitle = {{{Deep Reinforcement Learning}} in {{Strategic Multi-Agent Games}}},
month = jul,
year = {2019},
copyright = {openAccess}
}
```

Binary file modified bandana/TournamentRunner.jar
Binary file not shown.
Binary file modified bandana/agents/DeepDip.jar
Binary file not shown.
Binary file modified bandana/agents/DumbBot.jar
Binary file not shown.
30 changes: 14 additions & 16 deletions deepdip/deepdip_stable-baselines.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,9 @@
gym_env_id = 'Diplomacy_Strategy-v0'
algorithm = 'ppo2'
total_timesteps = 1e6
saving_interval = 8 #1 interval = 128 steps
steps_to_calculate_mean = saving_interval * 128
saving_interval = 20
evaluate_timesteps = 1e4
best_mean_reward, n_steps = 0, 0
best_mean_reward, n_episodes = 0, 0

current_time_string = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
log_dir = "./deepdip-results/"
Expand Down Expand Up @@ -128,13 +127,13 @@ def callback(_locals, _globals):
:param _locals: (dict)
:param _globals: (dict)
"""
global best_mean_reward, n_steps, saving_interval
global best_mean_reward, n_episodes, saving_interval

n_steps += 1
if n_steps % saving_interval == 0:
x, y = ts2xy(load_results(log_dir), 'timesteps')
n_episodes += 1
if n_episodes % saving_interval == 0:
x, y = ts2xy(load_results(log_dir), 'episodes')
if len(x) > 0:
mean_reward = np.mean(y[-steps_to_calculate_mean:])
mean_reward = np.mean(y[-int(saving_interval):])
logger.info("{}: Best mean reward: {:.2f} - Last mean reward per episode: {:.2f}\n".format(x[-1], best_mean_reward, mean_reward))

with open("mean_reward.txt", "a") as text_file:
Expand All @@ -149,7 +148,7 @@ def callback(_locals, _globals):
return True


def evaluate(env, num_steps=1e3):
def evaluate(env, num_steps=1e4):
"""
Evaluate a RL agent
:param model: (BaseRLModel object) the RL Agent
Expand Down Expand Up @@ -191,14 +190,13 @@ def plot_results(log_folder, title='Learning Curve'):
:param log_folder: (str) the save location of the results to plot
:param title: (str) the title of the task to plot
"""
x, y = ts2xy(load_results(log_folder), 'timesteps')
x, y = ts2xy(load_results(log_folder), 'episodes')
y = moving_average(y, window=1)
# Truncate x
x = x[len(x) - len(y):]

fig = plt.figure(title)
plt.plot(x, y)
plt.xlabel('Number of Timesteps')
plt.xlabel('Number of Episodes')
plt.ylabel('Rewards')
plt.title(title + " Smoothed")
plt.show()
Expand All @@ -215,7 +213,7 @@ def plot_rewards():
steps.append(float(step))
rewards.append(float(reward))

plt.plot(steps, rewards, label='y=3^x')
plt.plot(steps, rewards)
plt.xlabel('Number of Timesteps')
plt.ylabel('Rewards')
plt.title("Learning Curve")
Expand All @@ -226,9 +224,9 @@ def plot_rewards():

if __name__ == '__main__':
env = make_env(gym_env_id)
train(env, total_timesteps)
evaluate(env, evaluate_timesteps)
plot_results(log_dir)
#train(env, total_timesteps)
#evaluate(env, evaluate_timesteps)
#plot_results(log_dir)
plot_rewards()
env.close()

34 changes: 0 additions & 34 deletions gym-diplomacy/README.md

This file was deleted.

75 changes: 0 additions & 75 deletions gym-diplomacy/dip_q_brain.py

This file was deleted.

2 changes: 1 addition & 1 deletion gym-diplomacy/gym_diplomacy/envs/diplomacy_strategy_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
### CONSTANTS
NUMBER_OF_ACTIONS = 3
MAPS = ['mini', 'small', 'three', 'standard']
CURRENT_MAP = MAPS[2]
CURRENT_MAP = MAPS[3]
PLAYERS = {'mini':2, 'small':2, 'three':3, 'standard':7}
NUMBER_OF_PLAYERS = PLAYERS[CURRENT_MAP]
REGIONS = {'mini':10, 'small':19, 'three':37, 'standard':121}
Expand Down
21 changes: 0 additions & 21 deletions gym-diplomacy/tests/spaces.py

This file was deleted.

0 comments on commit d205bb0

Please sign in to comment.