Skip to content

Commit

Permalink
Bug fix and tests
Browse files Browse the repository at this point in the history
  • Loading branch information
bpiwowar committed Aug 27, 2024
1 parent 60a8c81 commit 2daeebe
Show file tree
Hide file tree
Showing 5 changed files with 141 additions and 33 deletions.
133 changes: 106 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,161 @@
# PySuperTuxKart gymnasium wrapper

[![PyPI version](https://badge.fury.io/py/pystk2-gymnasium.svg)](https://badge.fury.io/py/pystk2-gymnasium)
[![PyPI
version](https://badge.fury.io/py/pystk2-gymnasium.svg)](https://badge.fury.io/py/pystk2-gymnasium)

Read the [Changelog](./CHANGELOG.md)

## Install

The PySuperKart2 gymnasium wrapper is a Python package, so installing is fairly easy
The PySuperKart2 gymnasium wrapper is a Python package, so installing is fairly
easy

`pip install pystk2-gymnasium`

Note that during the first run, SuperTuxKart assets are downloaded in the cache directory.
Note that during the first run, SuperTuxKart assets are downloaded in the cache
directory.

## AgentSpec

Each controlled kart is parametrized by `pystk2_gymnasium.AgentSpec`:

- `name` defines name of the player (displayed on top of the kart)
- `rank_start` defines the starting position (None for random, which is the default)
- `use_ai` flag (False by default) to ignore actions (when calling `step`, and use a SuperTuxKart bot)
- `camera_mode` can be set to `AUTO` (camera on for non STK bots), `ON` (camera on) or `OFF` (no camera).
- `rank_start` defines the starting position (None for random, which is the
default)
- `use_ai` flag (False by default) to ignore actions (when calling `step`, and
use a SuperTuxKart bot)
- `camera_mode` can be set to `AUTO` (camera on for non STK bots), `ON` (camera
on) or `OFF` (no camera).



## Environments

Limitations:

*Warning* only one SuperTuxKart environment can be created for now. Moreover, no graphics information
is available for now.
- only one SuperTuxKart environment can be created for now
- no graphics information is available (i.e. pixmap)

After importing `pystk2_gymnasium`, the following environments are available:

- `supertuxkart/full-v0` is the main environment containing complete observations. The observation and action spaces are both dictionaries with continuous or discrete variables (see below). The exact structure can be found using `env.observation_space` and `env.action_space`. The following options can be used to modify the environment:
- `supertuxkart/full-v0` is the main environment containing complete
observations. The observation and action spaces are both dictionaries with
continuous or discrete variables (see below). The exact structure can be found
using `env.observation_space` and `env.action_space`. The following options
can be used to modify the environment:
- `agent` is an `AgentSpec (see above)`
- `render_mode` can be None or `human`
- `track` defines the SuperTuxKart track to use (None for random). The full list can be found in `STKRaceEnv.TRACKS` after initialization with `initialize.initialize(with_graphics: bool)` has been called.
- `track` defines the SuperTuxKart track to use (None for random). The full
list can be found in `STKRaceEnv.TRACKS` after initialization with
`initialize.initialize(with_graphics: bool)` has been called.
- `num_kart` defines the number of karts on the track (3 by default)
- `max_paths` the maximum number of the (nearest) paths (a track is made of paths) to consider in the observation state
- `max_paths` the maximum number of the (nearest) paths (a track is made of
paths) to consider in the observation state
- `laps` is the number of laps (1 by default)
- `difficulty` is the difficulty of the AI bots (lowest 0 to highest 2, default to 2)
- `supertuxkart/simple-v0` is a simplified environment with a fixed number of observations for paths (controlled by `state_paths`, default 5), items (`state_items`, default 5), karts (`state_karts`, default 5)
- `supertuxkart/flattened-v0` has observation and action spaces simplified at the maximum (only `discrete` and `continuous` keys)
- `supertuxkart/flattened_continuous_actions-v0` removes discrete actions (default to 0) so this is steer/acceleration only in the continuous domain
- `supertuxkart/flattened_multidiscrete-v0` is like the previous one, but with fully multi-discrete actions. `acceleration_steps` and `steer_steps` (default to 5) control the number of discrete values for acceleration and steering respectively.
- `supertuxkart/flattened_discrete-v0` is like the previous one, but with fully discretized actions
- `difficulty` is the difficulty of the AI bots (lowest 0 to highest 2,
default to 2)

Some environments are created using wrappers,
- `supertuxkart/simple-v0` is a simplified environment with a fixed number of
observations for paths (controlled by `state_paths`, default 5), items
(`state_items`, default 5), karts (`state_karts`, default 5)
- `supertuxkart/flattened-v0` has observation and action spaces simplified at
the maximum (only `discrete` and `continuous` keys)
- `supertuxkart/flattened_continuous_actions-v0` removes discrete actions
(default to 0) so this is steer/acceleration only in the continuous domain
- `supertuxkart/flattened_multidiscrete-v0` is like the previous one, but with
fully multi-discrete actions. `acceleration_steps` and `steer_steps` (default
to 5) control the number of discrete values for acceleration and steering
respectively.
- `supertuxkart/flattened_discrete-v0` is like the previous one, but with fully
discretized actions

The reward $r_t$ at time $t$ is given by

$$ r_{t} = \frac{1}{10}(d_{t} - d_{t-1}) + (1 - \frac{\mathrm{pos}_t}{K}) \times (3 + 7 f_t) - 0.1 + 10 * f_t $$
$$ r_{t} = \frac{1}{10}(d_{t} - d_{t-1}) + (1 - \frac{\mathrm{pos}_t}{K})
\times (3 + 7 f_t) - 0.1 + 10 * f_t $$

where $d_t$ is the overall track distance at time $t$, $\mathrm{pos}_t$ the
position among the $K$ karts at time $t$, and $f_t$ is $1$ when the kart
finishes the race.

## Wrappers

Wrappers can be used to modify the environment.

### ConstantSizedObservations

Ensures that the number of observed items is constant (e.g. for other karts,
tracks).

### PolarObservations

Changes Cartesian coordinates to Polar ones.

### FlattenerWrapper

Flattens actions and observations

### FlattenMultiDiscreteActions

where $d_t$ is the
overall track distance at time $t$, $\mathrm{pos}_t$ the position among the $K$ karts at time $t$, and $f_t$ is $1$ when the kart finishes the race.

## Multi-agent environment

`supertuxkart/multi-full-v0` can be used to control multiple karts. It takes an
`agents` parameter that is a list of `AgentSpec`. Observations and actions are a dictionary of single-kart ones where **string** keys that range from `0` to `n-1` with `n` the number of karts.
`agents` parameter that is a list of `AgentSpec`. Observations and actions are a
dictionary of single-kart ones where **string** keys that range from `0` to
`n-1` with `n` the number of karts.

To use different gymnasium wrappers, one can use a `MonoAgentWrapperAdapter`.
Example

```py
agents = [
AgentSpec(use_ai=True, name="Yin Team", camera_mode=CameraMode.ON),
AgentSpec(use_ai=True, name="Yang Team", camera_mode=CameraMode.ON),
AgentSpec(use_ai=True, name="Zen Team", camera_mode=CameraMode.ON)
]

wrappers = [
partial(MonoAgentWrapperAdapter, wrapper_factories={
"0": lambda env: ConstantSizedObservations(env),
"1": lambda env: PolarObservations(ConstantSizedObservations(env)),
"2": lambda env: PolarObservations(ConstantSizedObservations(env))
}),
]

make_stkenv = partial(
make_env,
"supertuxkart/multi-full-v0",
render_mode="human",
num_kart=5,
agents=agents,
wrappers=wrappers
)
```

## Action and observation space

All the 3D vectors are within the kart referential (`z` front, `x` left, `y` up):
All the 3D vectors are within the kart referential (`z` front, `x` left, `y`
up):

- `distance_down_track`: The distance from the start
- `energy`: remaining collected energy
- `front`: front of the kart (3D vector)
- `items_position`: position of the items (3D vectors)
- `attachment`: the item attached to the kart (bonus box, banana, nitro/big, nitro/small, bubble gum, easter egg)
- `attachment`: the item attached to the kart (bonus box, banana, nitro/big,
nitro/small, bubble gum, easter egg)
- `attachment_time_left`: how much time the attachment will be kept
- `items_type`: type of the item
- `jumping`: is the kart jumping
- `karts_position`: position of other karts, beginning with the ones in front
- `max_steer_angle` the max angle of the steering (given the current speed)
- `distance_center_path`: distance to the center of the path
- `paths_distance`: the distance of the paths
- `paths_start`, `paths_end`, `paths_width`: 3D vector to the paths start and end, with their widths (sccalar)
- `paths_start`: 3D vectors to the the path s
- `powerup`
- `paths_start`, `paths_end`, `paths_width`: 3D vectors to the paths start and
end, and vector of their widths (scalar). The paths are sorted so that the
first element of the array is the current one.
- `powerup`: collected power-up
- `shield_time`
- `skeed_factor`
- `velocity`: velocity vector
Expand All @@ -91,7 +170,7 @@ from pystk2_gymnasium import AgentSpec
# In both case, this corresponds to a dictionary with two keys:
# - `continuous` is a vector corresponding to the continuous observations
# - `discrete` is a vector (of integers) corresponding to discrete observations
env = gym.make("supertuxkart/flattened-v0", render_mode="human", agent=AgentSpec(use_ai=False))
env = gym.make("supertuxkart/flattened-v0", render_mode="human", agents=[AgentSpec(use_ai=False)])

ix = 0
done = False
Expand Down
9 changes: 6 additions & 3 deletions src/pystk2_gymnasium/envs.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,7 @@ def list_permute(list, sort_ix):
list[:] = (list[ix] for ix in sort_ix)

def sort_closest(positions, *lists):
# z axis is front
distances = [np.linalg.norm(p) * np.sign(p[2]) for p in positions]

# Change distances: if d < 0, d <- -d+max_d+1
Expand Down Expand Up @@ -328,18 +329,20 @@ def sort_closest(positions, *lists):
)

# Add action if using AI bot
# (this corresponds to the action before the observation)
obs = {}
if use_ai:
# Adds actions
action = self.race.get_kart_action(kart_ix)
obs = {
"action": {
"acceleration": np.array([action.acceleration], dtype=np.float32),
"brake": action.brake,
"nitro": action.nitro,
"drift": action.drift,
"rescue": action.rescue,
"fire": action.fire,
"nitro": action.nitro,
"rescue": action.rescue,
"steer": np.array([action.steer], dtype=np.float32),
"acceleration": np.array([action.acceleration], dtype=np.float32),
}
}

Expand Down
3 changes: 2 additions & 1 deletion src/pystk2_gymnasium/stk_wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ class STKDiscreteAction(STKAction):

class DiscreteActionsWrapper(ActionObservationWrapper):
# Wraps the actions
def __init__(self, env: gym.Env, *, acceleration_steps=5, steer_steps=5, **kwargs):
def __init__(self, env: gym.Env, *, acceleration_steps=5, steer_steps=10, **kwargs):
super().__init__(env, **kwargs)

self._action_space = copy.deepcopy(env.action_space)
Expand All @@ -138,6 +138,7 @@ def __init__(self, env: gym.Env, *, acceleration_steps=5, steer_steps=5, **kwarg
self._action_space["steer"] = self.d_steer.space

if "action" in self.observation_space:
# When using AI, "action" is part of the observation space
self._observation_space = copy.deepcopy(self.observation_space)
self._observation_space["action"]["steer"] = self.d_steer.space
self._observation_space["action"][
Expand Down
11 changes: 10 additions & 1 deletion src/pystk2_gymnasium/wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@


class SpaceFlattener:
"""Flattens an observation or action space
If the space has discrete and continuous values, returns
a dictionary with "continuous" and "discrete" keys – each associated with
a flattened observation or action. Otherwise, returns the flattened space itself.
"""

def __init__(self, space: gym.Space):
# Flatten the observation space
self.continuous_keys = []
Expand All @@ -31,7 +38,7 @@ def __init__(self, space: gym.Space):
highs = []
counts = []

# Combine keys (sort them before hand)
# Combine keys (sort them beforehand so we always have the same order)
for key, value in sorted(space.items(), key=lambda x: x[0]):
# Ignore the AI action
if key == "action":
Expand Down Expand Up @@ -75,6 +82,8 @@ def __init__(self, space: gym.Space):


class FlattenerWrapper(ActionObservationWrapper):
"""Flattens actions and observations."""

def __init__(self, env: gym.Env):
super().__init__(env)

Expand Down
18 changes: 17 additions & 1 deletion tests/test_consistency.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import gymnasium
import numpy as np
import numpy.testing
import pystk2
from pystk2_gymnasium.utils import rotate
from pystk2_gymnasium.utils import Discretizer, rotate
from pystk2_gymnasium.envs import STKRaceEnv


Expand All @@ -22,3 +24,17 @@ def test_rotation():
if race is not None:
race.stop()
del race


def test_discretizer():
k = 5

discretizer = Discretizer(gymnasium.spaces.Box(-1, 1, shape=(1,)), k)
step = 2.0 / (k - 1)

for j in range(k):
assert discretizer.discretize(discretizer.continuous(j)) == j, f"For index {j}"

for x in np.arange(-1, 1, step):
xhat = discretizer.continuous(discretizer.discretize(x))
assert np.abs(xhat - x) < step, f"For value {x} vs {xhat}"

0 comments on commit 2daeebe

Please sign in to comment.