ReadMe

Made a proper ReadMe file to introduce new users to the environment. Also removed unused files. Updated jar.
BlueDi · Jul 22, 2019 · d205bb0 · d205bb0
1 parent 23def91
commit d205bb0
Show file tree

Hide file tree

Showing 9 changed files with 113 additions and 148 deletions.
diff --git a/README.md b/README.md
@@ -1,2 +1,99 @@
 # DeepDip
-DQN agent that plays Diplomacy in BANDANA
+**DeepDip** is an agent designed to play **no-press Diplomacy** using **Deep Reinforcement Learning** to decide its Orders.
+
+## gym-diplomacy
+This repository offers a **new OpenAI Gym environment** to play **no-press Diplomacy**.
+* It uses the **BANDANA** framework to set up the rules of the game.
+* It uses the architecture of **OpenAI Gym** to make it standard and easily recognizable to any developer.
+* It is compatible with the existing agents from **Baselines**.
+
+## Variants
+This environment provides three different variants.
+* **Standard** - the original Diplomacy map for seven players.
+* **Three** - a three-player variant to study how the agent behaves in a multiplayer game.
+* **Small** - a two-player variant to understand how the environment works and study the performance of the agent.
+
+## Setup
+### Dependencies
+Java JDK, Python 2, Python 3, Pipenv.
+
+### Setup Parlance
+**Parlance**, the **BANDANA's game engine**, is written in **Python 2**.
+The repository provides a **custom version of Parlance** that has additional variants.
+The following script install **Parlance** using the version provided in this repository.
+```bash
+pip2 install -e parlance
+```
+
+### Setup BANDANA
+**BANDANA** is written in **Java**.
+As such, its agents and game engine need to be compiled using **Java JDK**.
+It is also recommended to use **Maven** to compile using the provided `bandana/pom.xml` file.
+Using the following command will compile the **Java** files.
+```bash
+mvn -f bandana clean install
+```
+
+### Setup Python packages
+This project uses **Pipenv** to manage its **Python 3** packages.
+First, setup the **new Gym environment** by adding **gym-diplomacy** to **Baselines** setup using the following script.
+```bash
+echo 'import gym-diplomacy\n' > temp__init__.py
+cat __init__.py >> temp__init__.py
+mv temp__init__.py __init__.py
+```
+To install all the necessary packages, you need to set up properly **Pipenv** and run the following command to install the packages of `Pipfile`.
+```bash
+pipenv install
+```
+
+### Setup Protobuf
+**Protobuf** establishes the *communication* between **BANDANA's Java** and **Gym's Python**.
+You can set it up using the following script.
+```bash
+cd protobuf
+make
+cd ..
+```
+
+## Usage
+### Run the agent
+The following script will start the training process of the agent.
+```bash
+cd deepdip
+pipenv run python deepdip_stable-baselines.py
+```
+It will create a `deepdip-results` folder.
+* In `pickles`, it will store the *most recent* and the *best* model.
+* The other folders with the name of the algorithm (*PPO2*) will store the **TensorBoard** files.
+* The `monitor.csv` files are the records of the training process of the Gym environment.
+
+DeepDip uses the *most recent* model to train, and the *best* model to evaluate itself.
+The trained model can be used by changing the `deepdip-results` folder to one of the provided.
+
+At the end of the `deepdip_stable-baselines.py` script, it is generated a graph with the evolution of the rewards.
+
+### Choose Variant Map
+The code is using the **Three** variant by default.
+
+To change the variant in use, two alterations on the code must be done.
+1. On the **Java**, in the file `TournamentRunner.java`, change the variable `GAME_MAP` to 'standard', or 'small', or 'three'.
+2. On the **Python**, in the file `diplomacy_strategy_env.py`, change the variable `CURRENT_MAP` to 'standard', or 'small', or 'three', or to the desired position of the variable `MAPS`.
+
+After these changes, rerun **Maven** to apply the alterations.
+
+## Citation
+This work was **my Master Thesis**.
+
+To cite this work in publications:
+```
+@article{cruz_strategicdiplomacy_2019,
+    author = {Cruz, Diogo and {Lopes Cardoso}, Henrique},
+    title = {{{Deep Reinforcement Learning}} in {{Strategic Multi-Agent Games}}: the case of No-Press {{Diplomacy}}},
+    shorttitle = {{{Deep Reinforcement Learning}} in {{Strategic Multi-Agent Games}}},
+    month = jul,
+    year = {2019},
+    copyright = {openAccess}
+}
+```
+
diff --git a/bandana/TournamentRunner.jar b/bandana/TournamentRunner.jar
diff --git a/bandana/agents/DeepDip.jar b/bandana/agents/DeepDip.jar
diff --git a/bandana/agents/DumbBot.jar b/bandana/agents/DumbBot.jar
diff --git a/deepdip/deepdip_stable-baselines.py b/deepdip/deepdip_stable-baselines.py
@@ -25,10 +25,9 @@
 gym_env_id = 'Diplomacy_Strategy-v0'
 algorithm = 'ppo2'
 total_timesteps = 1e6
-saving_interval = 8 #1 interval = 128 steps
-steps_to_calculate_mean = saving_interval * 128
+saving_interval = 20
 evaluate_timesteps = 1e4
-best_mean_reward, n_steps = 0, 0
+best_mean_reward, n_episodes = 0, 0
 
 current_time_string = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
 log_dir = "./deepdip-results/"
@@ -128,13 +127,13 @@ def callback(_locals, _globals):
     :param _locals: (dict)
     :param _globals: (dict)
     """
-    global best_mean_reward, n_steps, saving_interval
+    global best_mean_reward, n_episodes, saving_interval
 
-    n_steps += 1
-    if n_steps % saving_interval == 0:
-        x, y = ts2xy(load_results(log_dir), 'timesteps')
+    n_episodes += 1
+    if n_episodes % saving_interval == 0:
+        x, y = ts2xy(load_results(log_dir), 'episodes')
         if len(x) > 0:
-            mean_reward = np.mean(y[-steps_to_calculate_mean:])
+            mean_reward = np.mean(y[-int(saving_interval):])
             logger.info("{}: Best mean reward: {:.2f} - Last mean reward per episode: {:.2f}\n".format(x[-1], best_mean_reward, mean_reward))
 
             with open("mean_reward.txt", "a") as text_file:
@@ -149,7 +148,7 @@ def callback(_locals, _globals):
     return True
 
 
-def evaluate(env, num_steps=1e3):
+def evaluate(env, num_steps=1e4):
     """
     Evaluate a RL agent
     :param model: (BaseRLModel object) the RL Agent
@@ -191,14 +190,13 @@ def plot_results(log_folder, title='Learning Curve'):
     :param log_folder: (str) the save location of the results to plot
     :param title: (str) the title of the task to plot
     """
-    x, y = ts2xy(load_results(log_folder), 'timesteps')
+    x, y = ts2xy(load_results(log_folder), 'episodes')
     y = moving_average(y, window=1)
-    # Truncate x
     x = x[len(x) - len(y):]
 
     fig = plt.figure(title)
     plt.plot(x, y)
-    plt.xlabel('Number of Timesteps')
+    plt.xlabel('Number of Episodes')
     plt.ylabel('Rewards')
     plt.title(title + " Smoothed")
     plt.show()
@@ -215,7 +213,7 @@ def plot_rewards():
             steps.append(float(step))
             rewards.append(float(reward))
 
-    plt.plot(steps, rewards, label='y=3^x')
+    plt.plot(steps, rewards)
     plt.xlabel('Number of Timesteps')
     plt.ylabel('Rewards')
     plt.title("Learning Curve")
@@ -226,9 +224,9 @@ def plot_rewards():
 
 if __name__ == '__main__':
     env = make_env(gym_env_id)
-    train(env, total_timesteps)
-    evaluate(env, evaluate_timesteps)
-    plot_results(log_dir)
+    #train(env, total_timesteps)
+    #evaluate(env, evaluate_timesteps)
+    #plot_results(log_dir)
     plot_rewards()
     env.close()
 
diff --git a/gym-diplomacy/README.md b/gym-diplomacy/README.md
diff --git a/gym-diplomacy/dip_q_brain.py b/gym-diplomacy/dip_q_brain.py
diff --git a/gym-diplomacy/gym_diplomacy/envs/diplomacy_strategy_env.py b/gym-diplomacy/gym_diplomacy/envs/diplomacy_strategy_env.py
@@ -26,7 +26,7 @@
 ### CONSTANTS
 NUMBER_OF_ACTIONS = 3
 MAPS = ['mini', 'small', 'three', 'standard']
-CURRENT_MAP = MAPS[2]
+CURRENT_MAP = MAPS[3]
 PLAYERS = {'mini':2, 'small':2, 'three':3, 'standard':7}
 NUMBER_OF_PLAYERS = PLAYERS[CURRENT_MAP]
 REGIONS = {'mini':10, 'small':19, 'three':37, 'standard':121}

diff --git a/gym-diplomacy/tests/spaces.py b/gym-diplomacy/tests/spaces.py