feat: pacman environment (#186)

Co-authored-by: Clément Bonnet <[email protected]> Co-authored-by: Sasha <[email protected]>
instadeepai · Jan 29, 2024 · 0ba80dc · 0ba80dc
1 parent 8168c5c
commit 0ba80dc
Show file tree

Hide file tree

Showing 26 changed files with 2,359 additions and 38 deletions.
diff --git a/README.md b/README.md
@@ -45,6 +45,9 @@
     <img src="docs/env_anim/tetris.gif" alt="Tetris" width="16%">
     <img src="docs/env_anim/tsp.gif" alt="Tetris" width="16%">
   </div>
+    <div class="row" align="center">
+    <img src="docs/env_anim/pac_man.gif" alt="RobotWarehouse" width="16%">
+  </div>
 </div>
 
 
@@ -108,6 +111,7 @@ problems.
 | 🐍 Snake                                       | Routing  | `Snake-v1`                                           | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/snake/)     | [doc](https://instadeepai.github.io/jumanji/environments/snake/)       |
 | 📬 TSP (Travelling Salesman Problem)           | Routing  | `TSP-v1`                                             | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/tsp/)       | [doc](https://instadeepai.github.io/jumanji/environments/tsp/)         |
 | Multi Minimum Spanning Tree Problem | Routing  | `MMST-v0`                                | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/mmst)    | [doc](https://instadeepai.github.io/jumanji/environments/mmst/)    |
+| ᗧ•••ᗣ•• PacMan   | Routing  | `PacMan-v0`                                            | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/pacman/)      | [doc](https://instadeepai.github.io/jumanji/environments/pacman/)
 
 <h2 name="install" id="install">Installation 🎬</h2>
 

diff --git a/docs/api/environments/pac_man.md b/docs/api/environments/pac_man.md
@@ -0,0 +1,9 @@
+::: jumanji.environments.routing.pac_man.env.PacMan
+    selection:
+      members:
+        - __init__
+        - observation_spec
+        - action_spec
+        - reset
+        - step
+        - render
diff --git a/docs/env_anim/pac_man.gif b/docs/env_anim/pac_man.gif
diff --git a/docs/env_img/pac_man.png b/docs/env_img/pac_man.png
diff --git a/docs/environments/pac_man.md b/docs/environments/pac_man.md
@@ -0,0 +1,65 @@
+# PacMan Environment
+
+<p align="center">
+        <img src="../env_anim/pac_man.gif" width="600"/>
+</p>
+
+We provide here a minimal Jax JIT-able implementation of the game [PAC-MAN](https://pacman.com/en/history/). The game is played in a 2D matrix where a cell is a free space (black), a wall (dark blue), pacman (yellow) or a ghost.
+
+
+The goal is for the agent (yellow) to collect all of the pellets (small pink blocks) on the map without touching any of the ghosts. The agent receives a reward of +10 when collecting a pellet for the first time and pellets are removed from the map after being collected.
+
+The power-ups (large pink blocks) trigger a 'scatter mode' which changes the colour of the ghosts to dark blue for 30 in game steps. When the ghosts are in this state, the player can touch them which causes them to return to the center of the map. This gives a reward of +200 for each unique ghost.
+
+The agent selects an action at each timestep (up, left, right, down, no-op) which determines the direction they wil travel for that step. However, even if an action is in an invalid direction it will still be taken as input and the player will remain stationary. If the no-op action is used the player will not stop but instead take the last action that was selected.
+
+The game takes place on a fixed map and the same map is generated on each reset. The generator can be used to generate new maps based on an ASCII representation of the desired map. This ASCII generator is deterministic and will always initialise to the same state as long as the same ASCII diagram is is use.
+
+## Observation
+As an observation, the agent has access to the current maze configuration in the array named
+`grid`. It also has access to its current position `player_locations`, the ghosts' locations
+`ghost_locations`, the power-pellet locations `power_up_location`, the time left for the scatter state `frightened_state_time`, the pellet locations `pellet_locations` and the action
+mask `action_mask`.
+
+- `agent_position`: Position(row, col) (int32) each of shape `()`, agent position in the maze.
+
+- `ghost_locations`: jax array (int32) of shape `(4,2)`, with the (y,x) coordinates of each ghost
+
+- `power_up_locations`: jax array (int32) of shape `(4,2)`, with the (y,x) coordinates of each power-pellet
+
+- `pellet_locations`: jax array (int32) of shape `(4,2)`, with the (y,x) coordinates of each pellet
+
+- `frightened_state_time`: jax array (int32) of shape `()`, number of steps left of the scatter state.
+
+- `action_mask`: jax array (bool) of shape `(5,)`, binary values denoting whether each action is
+possible.
+- `frightened_state_time`: (int32) tracking the number of steps for the scatter state.
+- `score`: (int32) tracking the total points accumulated since the last reset.
+
+An example 5x5 observation `grid` array, is shown below. 1 represents a wall, and 0 represents free
+space.
+
+```
+[0, 1, 0, 0, 0],
+[0, 1, 0, 1, 1],
+[0, 1, 0, 0, 0],
+[0, 0, 0, 1, 1],
+[0, 0, 0, 0, 0]
+```
+
+
+## Action
+The action space is a `DiscreteArray` of integer values in the range of [0, 4]. I.e. the agent can
+take one of four actions: up (`0`), right (`1`), down (`2`), left (`3`) or no-op (`4`). If an invalid action is
+taken, or an action is blocked by a wall, a no-op is performed and the agent's position remains
+unchanged. Additionally if a no-op is performed the agent will use the last normal action used.
+
+
+## Reward
+PacMan is a dense reward setting, where the agent receives a reward of +10 for each pellet collected. The agent also recieve a reward of 20 for collecting a power pellet. The game ends when the agent has collected all 316 pellets on the map or touches a ghost.
+
+Eating a ghost when scatter mode is enabled also awards +200 points but, points are only awarded the first time each unique ghost is eaten.
+
+
+## Registered Versions 📖
+- `PacMan-v0`, PacMan in a 31x28 map with simple grid observations.
diff --git a/examples/training.ipynb b/examples/training.ipynb
@@ -2,26 +2,24 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "<a target=\"_blank\" href=\"https://colab.research.google.com/github/instadeepai/jumanji/blob/main/examples/training.ipynb\">\n",
     "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
     "</a>"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": 8,
    "metadata": {
-    "collapsed": true,
-    "jupyter": {
-     "outputs_hidden": true
-    },
     "ExecuteTime": {
      "end_time": "2023-06-14T10:11:33.230999708Z",
      "start_time": "2023-06-14T10:11:13.526881698Z"
+    },
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
     }
    },
    "outputs": [],
@@ -33,6 +31,12 @@
   {
    "cell_type": "code",
    "execution_count": 9,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-06-14T10:11:33.245117659Z",
+     "start_time": "2023-06-14T10:11:33.237735383Z"
+    }
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -61,26 +65,18 @@
     "        print(\"A TPU is connected.\")\n",
     "    else:\n",
     "        print(\"Only CPU accelerator is connected.\")"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-06-14T10:11:33.245117659Z",
-     "start_time": "2023-06-14T10:11:33.237735383Z"
-    }
-   }
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": 10,
    "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
     "ExecuteTime": {
      "end_time": "2023-06-14T10:11:33.268137075Z",
      "start_time": "2023-06-14T10:11:33.246267189Z"
+    },
+    "jupyter": {
+     "outputs_hidden": false
     }
    },
    "outputs": [],
@@ -96,13 +92,12 @@
    "cell_type": "code",
    "execution_count": 11,
    "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
     "ExecuteTime": {
      "end_time": "2023-06-14T10:11:33.279561988Z",
      "start_time": "2023-06-14T10:11:33.268947238Z"
+    },
+    "jupyter": {
+     "outputs_hidden": false
     }
    },
    "outputs": [],
@@ -114,6 +109,12 @@
   {
    "cell_type": "code",
    "execution_count": 12,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-06-14T10:11:33.662474073Z",
+     "start_time": "2023-06-14T10:11:33.281569701Z"
+    }
+   },
    "outputs": [],
    "source": [
     "#@title Download Jumanji Configs (run me) { display-mode: \"form\" }\n",
@@ -139,26 +140,18 @@
     "env_url = f\"https://raw.githubusercontent.com/instadeepai/jumanji/main/jumanji/training/configs/env/{env}.yaml\"\n",
     "os.makedirs(\"configs/env\", exist_ok=True)\n",
     "download_file(env_url, f\"configs/env/{env}.yaml\")"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-06-14T10:11:33.662474073Z",
-     "start_time": "2023-06-14T10:11:33.281569701Z"
-    }
-   }
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": 13,
    "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
     "ExecuteTime": {
      "end_time": "2023-06-14T10:12:46.061682766Z",
      "start_time": "2023-06-14T10:11:33.664132133Z"
+    },
+    "jupyter": {
+     "outputs_hidden": false
     }
    },
    "outputs": [
@@ -436,7 +429,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.9"
+   "version": "3.8.10"
   }
  },
  "nbformat": 4,

diff --git a/jumanji/__init__.py b/jumanji/__init__.py
@@ -127,3 +127,6 @@
 
 # TSP with 20 randomly generated cities and a dense reward function.
 register(id="TSP-v1", entry_point="jumanji.environments:TSP")
+
+# Pacman - minimal version of Atarti Pacman game
+register(id="PacMan-v0", entry_point="jumanji.environments:PacMan")
diff --git a/jumanji/environments/__init__.py b/jumanji/environments/__init__.py
@@ -32,6 +32,7 @@
     maze,
     mmst,
     multi_cvrp,
+    pac_man,
     robot_warehouse,
     snake,
     tsp,
@@ -42,6 +43,7 @@
 from jumanji.environments.routing.maze.env import Maze
 from jumanji.environments.routing.mmst.env import MMST
 from jumanji.environments.routing.multi_cvrp import MultiCVRP
+from jumanji.environments.routing.pac_man.env import PacMan
 from jumanji.environments.routing.robot_warehouse.env import RobotWarehouse
 from jumanji.environments.routing.snake.env import Snake
 from jumanji.environments.routing.tsp.env import TSP

diff --git a/jumanji/environments/routing/pac_man/__init__.py b/jumanji/environments/routing/pac_man/__init__.py
@@ -0,0 +1,16 @@
+# Copyright 2022 InstaDeep Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from jumanji.environments.routing.pac_man.env import PacMan
+from jumanji.environments.routing.pac_man.types import Observation, State
diff --git a/jumanji/environments/routing/pac_man/constants.py b/jumanji/environments/routing/pac_man/constants.py
@@ -0,0 +1,55 @@
+# Copyright 2022 InstaDeep Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import jax.numpy as jnp
+
+MOVES = jnp.array(
+    [[0, -1], [-1, 0], [0, 1], [1, 0], [0, 0]]
+)  # Up, Right, Down, Left, No-op
+
+
+# Default Maze design
+DEFAULT_MAZE = [
+    "XXXXXXXXXXXXXXXXXXXXXXXXXXXX",
+    "X  S         XX         S  X",
+    "X XXXX XXXXX XX XXXXX XXXX X",
+    "X XXXXOXXXXX XX XXXXXOXXXX X",
+    "X XXXX XXXXX XX XXXXX XXXX X",
+    "X                          X",
+    "X XXXX XX XXXXXXXX XX XXXX X",
+    "X XXXX XX XXXXXXXX XX XXXX X",
+    "X      XX   TXXT   XX      X",
+    "XXXXXX XXXXX XX XXXXX XXXXXX",
+    "XXXXXX XXXXX XX XXXXX XXXXXX",
+    "XXXXXX XXT        TXX XXXXXX",
+    "XXXXXX XX XXX XXXX XX XXXXXX",
+    "XXXXXX XX X  G   X XX XXXXXX",
+    "           GXXXXG           ",
+    "XXXXXX XX X  G   X XX XXXXXX",
+    "XXXXXX XX XXX XXXX XX XXXXXX",
+    "XXXXXX XX          XX XXXXXX",
+    "XXXXXX XX XXXXXXXX XX XXXXXX",
+    "XXXXXX XX XXXXXXXX XX XXXXXX",
+    "X            XX            X",
+    "X XXXX XXXXX XX XXXXX XXXX X",
+    "X XXXX XXXXX XX XXXXX XXXX X",
+    "X   XX S     P     S  XX   X",
+    "XXX XX XX XXXXXXXX XX XX XXX",
+    "XXX XX XX XXXXXXXX XX XX XXX",
+    "X      XX    XX    XX      X",
+    "X XXXXXXXXXX XX XXXXXXXXXX X",
+    "X XXXXXXXXXX XX XXXXXXXXXX X",
+    "X       O             O    X",
+    "XXXXXXXXXXXXXXXXXXXXXXXXXXXX",
+]