Skip to content

Commit

Permalink
feat: pacman environment (#186)
Browse files Browse the repository at this point in the history
Co-authored-by: Clément Bonnet <[email protected]>
Co-authored-by: Sasha <[email protected]>
  • Loading branch information
3 people authored Jan 29, 2024
1 parent 8168c5c commit 0ba80dc
Show file tree
Hide file tree
Showing 26 changed files with 2,359 additions and 38 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@
<img src="docs/env_anim/tetris.gif" alt="Tetris" width="16%">
<img src="docs/env_anim/tsp.gif" alt="Tetris" width="16%">
</div>
<div class="row" align="center">
<img src="docs/env_anim/pac_man.gif" alt="RobotWarehouse" width="16%">
</div>
</div>


Expand Down Expand Up @@ -108,6 +111,7 @@ problems.
| 🐍 Snake | Routing | `Snake-v1` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/snake/) | [doc](https://instadeepai.github.io/jumanji/environments/snake/) |
| 📬 TSP (Travelling Salesman Problem) | Routing | `TSP-v1` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/tsp/) | [doc](https://instadeepai.github.io/jumanji/environments/tsp/) |
| Multi Minimum Spanning Tree Problem | Routing | `MMST-v0` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/mmst) | [doc](https://instadeepai.github.io/jumanji/environments/mmst/) |
| ᗧ•••ᗣ•• PacMan | Routing | `PacMan-v0` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/pacman/) | [doc](https://instadeepai.github.io/jumanji/environments/pacman/)

<h2 name="install" id="install">Installation 🎬</h2>

Expand Down
9 changes: 9 additions & 0 deletions docs/api/environments/pac_man.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
::: jumanji.environments.routing.pac_man.env.PacMan
selection:
members:
- __init__
- observation_spec
- action_spec
- reset
- step
- render
Binary file added docs/env_anim/pac_man.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/env_img/pac_man.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
65 changes: 65 additions & 0 deletions docs/environments/pac_man.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# PacMan Environment

<p align="center">
<img src="../env_anim/pac_man.gif" width="600"/>
</p>

We provide here a minimal Jax JIT-able implementation of the game [PAC-MAN](https://pacman.com/en/history/). The game is played in a 2D matrix where a cell is a free space (black), a wall (dark blue), pacman (yellow) or a ghost.


The goal is for the agent (yellow) to collect all of the pellets (small pink blocks) on the map without touching any of the ghosts. The agent receives a reward of +10 when collecting a pellet for the first time and pellets are removed from the map after being collected.

The power-ups (large pink blocks) trigger a 'scatter mode' which changes the colour of the ghosts to dark blue for 30 in game steps. When the ghosts are in this state, the player can touch them which causes them to return to the center of the map. This gives a reward of +200 for each unique ghost.

The agent selects an action at each timestep (up, left, right, down, no-op) which determines the direction they wil travel for that step. However, even if an action is in an invalid direction it will still be taken as input and the player will remain stationary. If the no-op action is used the player will not stop but instead take the last action that was selected.

The game takes place on a fixed map and the same map is generated on each reset. The generator can be used to generate new maps based on an ASCII representation of the desired map. This ASCII generator is deterministic and will always initialise to the same state as long as the same ASCII diagram is is use.

## Observation
As an observation, the agent has access to the current maze configuration in the array named
`grid`. It also has access to its current position `player_locations`, the ghosts' locations
`ghost_locations`, the power-pellet locations `power_up_location`, the time left for the scatter state `frightened_state_time`, the pellet locations `pellet_locations` and the action
mask `action_mask`.

- `agent_position`: Position(row, col) (int32) each of shape `()`, agent position in the maze.

- `ghost_locations`: jax array (int32) of shape `(4,2)`, with the (y,x) coordinates of each ghost

- `power_up_locations`: jax array (int32) of shape `(4,2)`, with the (y,x) coordinates of each power-pellet

- `pellet_locations`: jax array (int32) of shape `(4,2)`, with the (y,x) coordinates of each pellet

- `frightened_state_time`: jax array (int32) of shape `()`, number of steps left of the scatter state.

- `action_mask`: jax array (bool) of shape `(5,)`, binary values denoting whether each action is
possible.
- `frightened_state_time`: (int32) tracking the number of steps for the scatter state.
- `score`: (int32) tracking the total points accumulated since the last reset.

An example 5x5 observation `grid` array, is shown below. 1 represents a wall, and 0 represents free
space.

```
[0, 1, 0, 0, 0],
[0, 1, 0, 1, 1],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 0]
```


## Action
The action space is a `DiscreteArray` of integer values in the range of [0, 4]. I.e. the agent can
take one of four actions: up (`0`), right (`1`), down (`2`), left (`3`) or no-op (`4`). If an invalid action is
taken, or an action is blocked by a wall, a no-op is performed and the agent's position remains
unchanged. Additionally if a no-op is performed the agent will use the last normal action used.


## Reward
PacMan is a dense reward setting, where the agent receives a reward of +10 for each pellet collected. The agent also recieve a reward of 20 for collecting a power pellet. The game ends when the agent has collected all 316 pellets on the map or touches a ghost.

Eating a ghost when scatter mode is enabled also awards +200 points but, points are only awarded the first time each unique ghost is eaten.


## Registered Versions 📖
- `PacMan-v0`, PacMan in a 31x28 map with simple grid observations.
67 changes: 30 additions & 37 deletions examples/training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,24 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/instadeepai/jumanji/blob/main/examples/training.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"ExecuteTime": {
"end_time": "2023-06-14T10:11:33.230999708Z",
"start_time": "2023-06-14T10:11:13.526881698Z"
},
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
Expand All @@ -33,6 +31,12 @@
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-14T10:11:33.245117659Z",
"start_time": "2023-06-14T10:11:33.237735383Z"
}
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -61,26 +65,18 @@
" print(\"A TPU is connected.\")\n",
" else:\n",
" print(\"Only CPU accelerator is connected.\")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-06-14T10:11:33.245117659Z",
"start_time": "2023-06-14T10:11:33.237735383Z"
}
}
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"ExecuteTime": {
"end_time": "2023-06-14T10:11:33.268137075Z",
"start_time": "2023-06-14T10:11:33.246267189Z"
},
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
Expand All @@ -96,13 +92,12 @@
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"ExecuteTime": {
"end_time": "2023-06-14T10:11:33.279561988Z",
"start_time": "2023-06-14T10:11:33.268947238Z"
},
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
Expand All @@ -114,6 +109,12 @@
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-14T10:11:33.662474073Z",
"start_time": "2023-06-14T10:11:33.281569701Z"
}
},
"outputs": [],
"source": [
"#@title Download Jumanji Configs (run me) { display-mode: \"form\" }\n",
Expand All @@ -139,26 +140,18 @@
"env_url = f\"https://raw.githubusercontent.com/instadeepai/jumanji/main/jumanji/training/configs/env/{env}.yaml\"\n",
"os.makedirs(\"configs/env\", exist_ok=True)\n",
"download_file(env_url, f\"configs/env/{env}.yaml\")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-06-14T10:11:33.662474073Z",
"start_time": "2023-06-14T10:11:33.281569701Z"
}
}
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"ExecuteTime": {
"end_time": "2023-06-14T10:12:46.061682766Z",
"start_time": "2023-06-14T10:11:33.664132133Z"
},
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
Expand Down Expand Up @@ -436,7 +429,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.8.10"
}
},
"nbformat": 4,
Expand Down
3 changes: 3 additions & 0 deletions jumanji/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,6 @@

# TSP with 20 randomly generated cities and a dense reward function.
register(id="TSP-v1", entry_point="jumanji.environments:TSP")

# Pacman - minimal version of Atarti Pacman game
register(id="PacMan-v0", entry_point="jumanji.environments:PacMan")
2 changes: 2 additions & 0 deletions jumanji/environments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
maze,
mmst,
multi_cvrp,
pac_man,
robot_warehouse,
snake,
tsp,
Expand All @@ -42,6 +43,7 @@
from jumanji.environments.routing.maze.env import Maze
from jumanji.environments.routing.mmst.env import MMST
from jumanji.environments.routing.multi_cvrp import MultiCVRP
from jumanji.environments.routing.pac_man.env import PacMan
from jumanji.environments.routing.robot_warehouse.env import RobotWarehouse
from jumanji.environments.routing.snake.env import Snake
from jumanji.environments.routing.tsp.env import TSP
Expand Down
16 changes: 16 additions & 0 deletions jumanji/environments/routing/pac_man/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Copyright 2022 InstaDeep Ltd. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from jumanji.environments.routing.pac_man.env import PacMan
from jumanji.environments.routing.pac_man.types import Observation, State
55 changes: 55 additions & 0 deletions jumanji/environments/routing/pac_man/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Copyright 2022 InstaDeep Ltd. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import jax.numpy as jnp

MOVES = jnp.array(
[[0, -1], [-1, 0], [0, 1], [1, 0], [0, 0]]
) # Up, Right, Down, Left, No-op


# Default Maze design
DEFAULT_MAZE = [
"XXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"X S XX S X",
"X XXXX XXXXX XX XXXXX XXXX X",
"X XXXXOXXXXX XX XXXXXOXXXX X",
"X XXXX XXXXX XX XXXXX XXXX X",
"X X",
"X XXXX XX XXXXXXXX XX XXXX X",
"X XXXX XX XXXXXXXX XX XXXX X",
"X XX TXXT XX X",
"XXXXXX XXXXX XX XXXXX XXXXXX",
"XXXXXX XXXXX XX XXXXX XXXXXX",
"XXXXXX XXT TXX XXXXXX",
"XXXXXX XX XXX XXXX XX XXXXXX",
"XXXXXX XX X G X XX XXXXXX",
" GXXXXG ",
"XXXXXX XX X G X XX XXXXXX",
"XXXXXX XX XXX XXXX XX XXXXXX",
"XXXXXX XX XX XXXXXX",
"XXXXXX XX XXXXXXXX XX XXXXXX",
"XXXXXX XX XXXXXXXX XX XXXXXX",
"X XX X",
"X XXXX XXXXX XX XXXXX XXXX X",
"X XXXX XXXXX XX XXXXX XXXX X",
"X XX S P S XX X",
"XXX XX XX XXXXXXXX XX XX XXX",
"XXX XX XX XXXXXXXX XX XX XXX",
"X XX XX XX X",
"X XXXXXXXXXX XX XXXXXXXXXX X",
"X XXXXXXXXXX XX XXXXXXXXXX X",
"X O O X",
"XXXXXXXXXXXXXXXXXXXXXXXXXXXX",
]
Loading

0 comments on commit 0ba80dc

Please sign in to comment.