Merge pull request #114 from Bam4d/rllib_level_generator

Rllib level generator
Bam4d · Jun 4, 2021 · 0e0c311 · 0e0c311
2 parents 2c99a7b + faeb208
commit 0e0c311
Show file tree

Hide file tree

Showing 20 changed files with 321 additions and 104 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -18,3 +18,7 @@
 	path = python/examples/experiments/rts-self-play
 	url = https://github.com/Bam4d/rts-self-play
 	ignore = dirty
+[submodule "python/examples/experiments/autoregressive-cats"]
+	path = python/examples/experiments/autoregressive-cats
+	url = https://github.com/Bam4d/autoregressive-cats
+	ignore = dirty
diff --git a/bindings/wrapper/GriddlyLoaderWrapper.cpp b/bindings/wrapper/GriddlyLoaderWrapper.cpp
@@ -6,7 +6,6 @@
 #include "../../src/Griddly/Core/GDY/Objects/ObjectGenerator.hpp"
 #include "../../src/Griddly/Core/GDY/TerminationGenerator.hpp"
 #include "../../src/Griddly/Core/Grid.hpp"
-#include "../../src/Griddly/Core/Observers/Vulkan/VulkanObserver.hpp"
 #include "GDYWrapper.cpp"
 
 namespace griddly {

diff --git a/docs/_static/video/.gitignore b/docs/_static/video/.gitignore
@@ -0,0 +1 @@
+!*.mp4
diff --git a/docs/_static/video/griddly_rts.mp4 b/docs/_static/video/griddly_rts.mp4
diff --git a/docs/about/halloffame.rst b/docs/about/halloffame.rst
@@ -2,7 +2,7 @@
 Hall of Fame
 ############
 
-If you create a project that uses Griddly, please let us know and we will link it here. This includes if you use Griddly in any papers, use the griddly engine in another game project and want to share your work.
+If you create a project that uses Griddly, please let us know and we will link it here. This includes if you use Griddly in any papers, use the Griddly engine in another game project and want to share your work.
 
 .. note:: You can Be the first!
 

diff --git a/docs/about/introduction.rst b/docs/about/introduction.rst
@@ -4,7 +4,7 @@
 Introduction
 ############
 
-One of the most important things about AI research is data. In many Game Environments the rate of data (rendered frames per second, or state representations per second) is relatively slow meaning very long training times. Researchers can compensate for this problem by parallelising the number of games being played, sometimes on expensive hardward and sometimes on several servers requiring network infrastructure to pass states to the actual learning algorithms. For many researchers and hobbyists who want to learn. This approach is unobtainable and only the research teams with lots of funding and engineers supporting the hardware and infrastrcuture required.
+One of the most important things about AI research is data. In many Game Environments the rate of data (rendered frames per second, or state representations per second) is relatively slow meaning very long training times. Researchers can compensate for this problem by parallelizing the number of games being played, sometimes on expensive hardware and sometimes on several servers requiring network infrastructure to pass states to the actual learning algorithms. For many researchers and hobbyists who want to learn. This approach is unobtainable and only the research teams with lots of funding and engineers supporting the hardware and infrastructure required.
 
 Griddly provides a solution to this issue. 
 

diff --git a/docs/getting-started/procedural content generation/img/Doggo-level-Sprite2D-0.png b/docs/getting-started/procedural content generation/img/Doggo-level-Sprite2D-0.png
diff --git a/docs/getting-started/procedural content generation/img/generated_clusters.png b/docs/getting-started/procedural content generation/img/generated_clusters.png
diff --git a/docs/getting-started/procedural content generation/index.rst b/docs/getting-started/procedural content generation/index.rst
@@ -0,0 +1,240 @@
+.. _doc_tutorials_pcg:
+
+#############################
+Procedural Content Generation
+#############################
+
+Reinforcement learning can be prone to over-fitting in environments where the initial conditions are limited and the environment dynamics are deterministic.
+Procedural content generation is an important tool in Reinforcement learning, as it allows level maps to be created on-the-fly. This gives the agent a much more complex challenge, and stops it from being able to overfit on a small dataset of levels.
+
+
+**********
+Level Maps
+**********
+
+Levels in Griddly environments are defined by strings of characters. The ``MapCharacter`` used are defined in the GDY files of the game. These ``MapCharacter`` can be found in the GDY files or in the game's documentation.
+
+Basic Map
+=========
+
+.. code-block:: python
+
+    W  W  W  W  W  W
+    W  A  .  .  .  W
+    W  .  .  .  .  W
+    W  .  .  .  .  W
+    W  .  .  .  g  W
+    W  W  W  W  W  W
+
+.. figure:: img/Doggo-level-Sprite2D-0.png
+    :align: center
+
+    How the above Doggo level is rendered.
+
+
+You can see in this map example above that the ``A`` character defines the Dog and the ``g`` character defines the goal. ``W`` defines the walls and ``.`` is reserved for empty space.
+
+This is a basic example and generating levels for this environment might not be too interesting...
+
+
+************************
+Clusters Level Generator
+************************
+
+A much more complicated example would be to use the `Clusters<doc_clusters>` game and generate new levels. The aim of the Clusters game is for the agent to push coloured blocks together to form "clusters", whilst avoiding spikes.
+The game is fully deterministic and there are only 5 levels supplied in the original GDY file. This makes it a perfect candidate for building new levels and testing if Reinforcement Learning can still solve these levels!
+
+
+Level Generator Class
+=====================
+
+Here's an example of a level generator for the cluster's game. 
+
+The ``LevelGenerator`` class can be used as a base class. Only the ``generate`` function needs to be implemented.
+
+.. code-block:: python
+
+    class ClustersLevelGenerator(LevelGenerator):
+        BLUE_BLOCK = 'a'
+        BLUE_BOX = '1'
+        RED_BLOCK = 'b'
+        RED_BOX = '2'
+        GREEN_BLOCK = 'c'
+        GREEN_BOX = '3'
+
+        AGENT = 'A'
+
+        WALL = 'w'
+        SPIKES = 'h'
+
+        def __init__(self, config):
+            super().__init__(config)
+            self._width = config.get('width', 10)
+            self._height = config.get('height', 10)
+            self._p_red = config.get('p_red', 1.0)
+            self._p_green = config.get('p_green', 1.0)
+            self._p_blue = config.get('p_blue', 1.0)
+            self._m_red = config.get('m_red', 5)
+            self._m_blue = config.get('m_blue', 5)
+            self._m_green = config.get('m_green', 5)
+            self._m_spike = config.get('m_spike', 5)
+
+        def _place_walls(self, map):
+
+            # top/bottom wall
+            wall_y = np.array([0, self._height - 1])
+            map[:, wall_y] = ClustersLevelGenerator.WALL
+
+            # left/right wall
+            wall_x = np.array([0, self._width - 1])
+            map[wall_x, :] = ClustersLevelGenerator.WALL
+
+            return map
+
+        def _place_blocks_and_boxes(self, map, possible_locations, p, block_char, box_char, max_boxes):
+            if np.random.random() < p:
+                block_location_idx = np.random.choice(len(possible_locations))
+                block_location = possible_locations[block_location_idx]
+                del possible_locations[block_location_idx]
+                map[block_location[0], block_location[1]] = block_char
+
+                num_boxes = 1 + np.random.choice(max_boxes - 1)
+                for k in range(num_boxes):
+                    box_location_idx = np.random.choice(len(possible_locations))
+                    box_location = possible_locations[box_location_idx]
+                    del possible_locations[box_location_idx]
+                    map[box_location[0], box_location[1]] = box_char
+
+            return map, possible_locations
+
+        def generate(self):
+            map = np.chararray((self._width, self._height), itemsize=2)
+            map[:] = '.'
+
+            # Generate walls
+            map = self._place_walls(map)
+
+            # all possible locations
+            possible_locations = []
+            for w in range(1, self._width - 1):
+                for h in range(1, self._height - 1):
+                    possible_locations.append([w, h])
+
+            # Place Red
+            map, possible_locations = self._place_blocks_and_boxes(
+                map,
+                possible_locations,
+                self._p_red,
+                ClustersLevelGenerator.RED_BLOCK,
+                ClustersLevelGenerator.RED_BOX,
+                self._m_red
+            )
+
+            # Place Blue
+            map, possible_locations = self._place_blocks_and_boxes(
+                map,
+                possible_locations,
+                self._p_blue,
+                ClustersLevelGenerator.BLUE_BLOCK,
+                ClustersLevelGenerator.BLUE_BOX,
+                self._m_blue
+            )
+
+            # Place Green
+            map, possible_locations = self._place_blocks_and_boxes(
+                map,
+                possible_locations,
+                self._p_green,
+                ClustersLevelGenerator.GREEN_BLOCK,
+                ClustersLevelGenerator.GREEN_BOX,
+                self._m_green
+            )
+
+            # Place Spikes
+            num_spikes = np.random.choice(self._m_spike)
+            for k in range(num_spikes):
+                spike_location_idx = np.random.choice(len(possible_locations))
+                spike_location = possible_locations[spike_location_idx]
+                del possible_locations[spike_location_idx]
+                map[spike_location[0], spike_location[1]] = ClustersLevelGenerator.SPIKES
+
+            # Place Agent
+            agent_location_idx = np.random.choice(len(possible_locations))
+            agent_location = possible_locations[agent_location_idx]
+            map[agent_location[0], agent_location[1]] = ClustersLevelGenerator.AGENT
+
+            level_string = ''
+            for h in range(0, self._height):
+                for w in range(0, self._width):
+                    level_string += map[w, h].decode().ljust(4)
+                level_string += '\n'
+
+            return level_string
+
+This generates levels like the following:
+
+.. figure:: img/generated_clusters.png
+    :align: center
+
+    A 10x10 map generated by the above code.
+
+
+
+Using ``LevelGenerator`` 
+========================
+
+In the most simple case, the level generator can be used just before the level resets and the generated string can be passed to ``env.reset(level_string=...)``
+
+.. code-block:: python
+
+    if __name__ == '__main__':
+
+        config = {
+            'width': 10,
+            'height': 10
+        }
+
+        renderer = RenderToFile()
+
+        level_generator = ClustersLevelGenerator(config)
+
+        env = gym.make('GDY-Clusters-v0')
+        env.reset(level_string=level_generator.generate())
+
+        ...
+
+
+
+Using ``LevelGenerators`` with RLLib
+====================================
+
+The ``LevelGenerator`` base class is compatible with RLLib and can be used and configured through the standard RLLib configuration.
+
+For example, the level generator and its parameters can be set up in the ``env_config`` in the following way:
+
+.. code-block:: python
+    
+    'config': {
+
+        ...
+        
+        'env_config': {
+            'generate_valid_action_trees': True,
+            'level_generator': {
+                'class': ClustersLevelGenerator,
+                'config': {
+                    'width': 6,
+                    'height': 6,
+                    'p_red': 0.7,
+                    'p_green': 0.7,
+                    'p_blue': 0.7,
+                    'm_red': 4,
+                    'm_blue': 4,
+                    'm_green': 4,
+                    'm_spike': 4
+                }
+        },
+
+        ...
+    }
+
diff --git a/docs/index.rst b/docs/index.rst
@@ -32,6 +32,7 @@ Griddly documentation.
    getting-started/action spaces/index
    getting-started/observation spaces/index
    getting-started/visualization/index
+   getting-started/procedural content generation/index
 
 .. toctree::
    :maxdepth: 2

diff --git a/python/examples/experiments/autoregressive-cats b/python/examples/experiments/autoregressive-cats
diff --git a/python/examples/experiments/conditional-action-trees b/python/examples/experiments/conditional-action-trees
diff --git a/python/examples/experiments/rts-self-play b/python/examples/experiments/rts-self-play
diff --git a/python/griddly/GymWrapper.py b/python/griddly/GymWrapper.py
@@ -110,7 +110,7 @@ def step(self, action):
         elif len(action) == self.player_count:
 
             if np.ndim(action) == 1 or np.ndim(action) == 3:
-                if isinstance(action[0], list) or isinstance(action[0], np.ndarray):
+                if isinstance(action[0], list) or isinstance(action[0], np.ndarray) or isinstance(action[0], tuple):
                     # Multiple agents that can perform multiple actions in parallel
                     # Used in RTS games
                     reward = []
+3 −0		.gitignore
+6 −11		README.md
+0 −0		conditional_action_trees/__init__.py
+132 −0		conditional_action_trees/conditional_action_exploration.py
+95 −0		conditional_action_trees/conditional_action_mixin.py
+121 −0		conditional_action_trees/conditional_action_policy_trainer.py
+2 −1		requirements.txt
+2 −2		rllib_baseline.py
+2 −3		rllib_conditional_actions.py