This repository contains the client (agent) side of a simple framework for a reinforcement learning agent using Numentas HTM-algorithms. (DEVELOPMENT STATUS)
The agent architecture is based on Ali Kaan Sungurs Master Thesis (2017). It is slightly modified and reimplemented in NUPIC, Numentas Platform for Intelligent Computing. The implementation and experimentation was part of a Bachelor Thesis Project, where further information can be found.
It can be build locally from source or the docker images are modified for an easy integration in some cloud infrastructure. However optimization for parallel training of agents is not yet implemented.
The framework makes use of OpenAIs Universe World of Bits environment. Unfortunately this repository is deprecated now, but it is still compatible until to date.
The client and remote are both running in docker containers and the agent connects via VNC to the environment. The environment runs in real time and sends an observation and reward to the client, which in turn processes the data and sends an action. More information can be found in their original blog post.
The observation and reward is processed to the base unit in NUPIC an SDR.
For this purpose a UniverseEncoder
was implemented which applies the correct filtering on pixels and is integrated into PluggableUniverseSensor
, a slightly modified version from Numentas PluggableEncoderSensor
.
The Framework can either be set up from source or using existing docker images. This makes is easy to run the agent on a local machine or migrate it into the cloud to run remotely.
The agents architecture is further explained in the mentioned papers, here only a quick overview is given. However with NUPICs network API all layers can easily be interchanged or modified to experiment.
The layers are all defined in network.py
, where the network is created. Each layer consists of a pooling layer and a (customized) temporal memory implementation.
A short list about the layer implementations:
- L4 : Integrate sensations
MySPRegion
: Based on NUPICSpatialPooler
.SensoryIntegrationRegion
: Based on HTM-ResearchExtendedTemporalMemory
with possibility to weight apical and basal connections differently. The underlying algorithm is inregions/algorithms/apical_weighted_temporal_memory
.- L2/L3 : High level representations
MyTemporalPoolerRegion
: Based on HTM-ResearchUnionTemporalPooler
with linear decay. The underlying algorithm is inregions/algorithms/union_temporal_pooler
.MyTMRegion
: Based on HTM-ResearchExtendedTemporalMemory
with basal and proximal connections.- L5 : Agent State
MySPRegion
: Based on NUPICSpatialPooler
.AgentStateRegion
: Based on HTM-ResearchExtendedTemporalMemory
with basal, apical and proximal connections.- D1/D2 : Reinforcement learning
MySPRegion
: Based on NUPICSpatialPooler
.ReinforcementRegion
: Based on HTM-ResearchExtendedTemporalMemory
with TD-Error computation and other customizations.- Motor : Learn Motor mapping and produce behavior
MotorRegion
: The layer is based on HTM-ResearchApical_Distal_Temporal_Memory
, which was almost completely rewritten inregions/algorithms/apical_distal_motor_memory
. It contains the logic to calculate layer 5 voluntary active cells, excite/inhibit corresponding motor cells and map motor cells to the state activation they produced.
Many regions are almost identical to the original regions they are based on, thus the documentation of the regions is greatly retained. The ReinforcementRegion
and MotorRegion
are the ones most customized. Especially the Motor layer contains a lot of crucial functionality as it calculates (1) the voluntary active cells from layer 5 and (2) the actual motor cells that are excited/inhibited and mapped with the state they produced.
- Requirements:
- NUPIC (1.0.5dev0, commit: fcaea0f0cf5fc74b930a45f138279c654f870a80)
- NUPIC-Core (1.0.5dev0, commit: 94fde1f7b45fb143c5e5ffbb1e89812868328e12)
- HTM-Research (commit: de8539e643c898666f2dbd37ec7f79abfab4506b)
- HTM-Research-Core (commit: 8c0b19866533035662a247ec707ac66ce242b5be)
- Universe environment (
universe
&gym
) is installed and can be found by python. - The environment docker image (will be pulled from universe-environment if not installed) - soure code here
- Clone repository & run my_neural_net/src/myExample.py
- Pull Docker Image and follow instructions in
DOCKER_README.md
for more help.
The environment is based on OPENAIs mini-world-of-bits. An open-domain platform for web-agents as described in their paper. It enables the agent, or experiment-observer, to connect via remote desktop control (VNC) and control the environment.
Experiments are simply written in Javascript/HTML/CSS and thus easily modified or created by any curious researcher that want to test the architecture with a new task.
Example experiments can be found in environments/app/universe-envs/world-of-bits/static/miniwob
of the environment repository. The environment repository contains more information on how to create a customized experiment. An example experiment task from the paper:
- Clone repository and make sure python can find the correct
universe
andgym
import to your files.
- Pull Docker Image and follow instructions in
DOCKER_README.md
for more help.
Deploy the docker images on some cloud instance (Ubuntu image tested) and run them as described in the DOCKER_README.md
install instructions.
Example of observing the experiments remotely via VNC from the phone:
- Parameterize verbosity level of debug print-out (e.g. Indices)
- Refractor code and documentation (simplify some components that are based on NUPIC-components)
- Support/Optimize parallel training of multiple agents in the cloud.
- Finish serialization implementation (
SparseMatrixConnections
from NUPIC Core missing) - Add support for Player guided exploring
- Advance visualization and debug tools
The implementations might slightly vary from the current official NUPIC versions (based on NUPIC 1.0.5dev0) and used prev. versions of HTM-Research/Core repository