TensorForce: A TensorFlow library for applied reinforcement learning

Introduction

TensorForce is an open source reinforcement learning library focused on providing clear APIs, readability and modularisation to deploy reinforcement learning solutions both in research and practice. TensorForce is built on top of TensorFlow and compatible with Python 2.7 and >3.5 and supports multiple state inputs and multi-dimensional actions to be compatible with any type of simulation or application environment.

TensorForce also aims to move all reinforcement learning logic into the TensorFlow graph, including control flow. This both reduces dependencies on the host language (Python), thus enabling portable computation graphs that can be used in other languages and contexts, and improves performance.

More information on architecture can also be found on our blog. Please also read the TensorForce FAQ if you encounter problems or have questions.

Finally, read the latest update notes (UPDATE_NOTES.md) for an idea of how the project is evolving, especially concerning majorAPI breaking updates. We recently (20th February) merged a major branch which moves memories and all remaining structures into TensorFlow variables. This causes a number of breaking API change (see updated configurations, examples, and tests), but improves performance. It further enables more expressive update semantics, e.g. episode based instead of fixed time step based.

The main difference to existing libraries is a strict separation of environments, agents and update logic that facilitates usage in non-simulation environments. Further, research code often relies on fixed network architectures that have been used to tackle particular benchmarks. TensorForce is built with the idea that (almost) everything should be optionally configurable and in particular uses value function template configurations to be able to quickly experiment with new models. The goal of TensorForce is to provide a practitioner's reinforcement learning framework that integrates into modern software service architectures.

TensorForce is actively being maintained and developed both to continuously improve the existing code as well as to reflect new developments as they arise. The aim is not to include every new trick but to adopt methods as they prove themselves stable.

Features

TensorForce currently integrates with the OpenAI Gym API, OpenAI Universe, DeepMind lab, ALE and Maze explorer. The following algorithms are available (all policy methods both continuous/discrete and using a Beta distribution for bounded actions).

A3C using distributed TensorFlow or a multithreaded runner - now as part of our generic Model usable with different agents. - paper
Trust Region Policy Optimization (TRPO) - trpo_agent - paper
Normalised Advantage functions (NAFs) - naf_agent - paper
DQN - dqn_agent - paper
Double-DQN - ddqn_agent - paper
N-step DQN - dqn_nstep_agent
Vanilla Policy Gradients (VPG/ REINFORCE) - vpg_agent- paper
Actor-critic models - via baseline for any policy gradient model (see next list) - paper
Deep Q-learning from Demonstration (DQFD) - paper
Proximal Policy Optimisation (PPO) - ppo_agent - paper
Random and constant agents for sanity checking: random_agent, constant_agent

Other heuristics and their respective config key that can be turned on where sensible:

Generalized advantage estimation - gae_lambda - paper
Prioritizied experience replay - memory type prioritized_replay - paper
Bounded continuous actions are mapped to Beta distributions instead of Gaussians - paper
Baseline / actor-critic modes: Based on raw states (states) or on network output (network). MLP (mlp), CNN (cnn) or custom network (custom). Special case for mode states: baseline per state + linear combination layer (via baseline=dict(state1=..., state2=..., etc)).
Generic pure TensorFlow optimizers, most models can be used with natural gradient and evolutionary optimizers
Preprocessing modes: normalize, standardize, grayscale, sequence, clip, divide, image_resize
Exploration modes: constant,linear_decay, epsilon_anneal, epsilon_decay, ornstein_uhlenbeck

Installation

We uploaded the latest stable version of TensorForce to PyPI. To install, just execute:

pip install tensorforce

If you want to use the latest version from GitHub, use:

git clone [email protected]:reinforceio/tensorforce.git
cd tensorforce
pip install -e .

TensorForce is built on Google's Tensorflow. The installation command assumes that you have tensorflow or tensorflow-gpu installed. Tensorforce requires Tensorflow version 1.5 or later.

Alternatively, you can use the following commands to install the tensorflow dependency.

To install TensorForce with tensorflow (cpu), use:

# PyPI install
pip install tensorforce[tf]

# Local install
pip install -e .[tf]

To install TensorForce with tensorflow-gpu (gpu), use:

# PyPI install
pip install tensorforce[tf_gpu]

# Local install
pip install -e .[tf_gpu]

To update TensorForce, use pip install --upgrade tensorforce for the PyPI version, or run git pull in the tensorforce directory if you cloned the GitHub repository. Please note that we did not include OpenAI Gym/Universe/DeepMind lab in the default install script because not everyone will want to use these. Please install them as required, usually via pip.

Examples and documentation

For a quick start, you can run one of our example scripts using the provided configurations, e.g. to run the TRPO agent on CartPole, execute from the examples folder:

python examples/openai_gym.py CartPole-v0 -a examples/configs/ppo.json -n examples/configs/mlp2_network.json

Documentation is available at ReadTheDocs. We also have tests validating models on minimal environments which can be run from the main directory by executing pytest{.sourceCode}.

Create and use agents

To use TensorForce as a library without using the pre-defined simulation runners, simply install and import the library, then create an agent and use it as seen below (see documentation for all optional parameters):

from tensorforce.agents import PPOAgent

# Create a Proximal Policy Optimization agent
agent = PPOAgent(
    states=dict(type='float', shape=(10,)),
    actions=dict(type='int', num_actions=10),
    network=[
        dict(type='dense', size=64),
        dict(type='dense', size=64)
    ],
    batching_capacity=1000,
    step_optimizer=dict(
        type='adam',
        learning_rate=1e-4
    )
)

# Get new data from somewhere, e.g. a client to a web app
client = MyClient('http://127.0.0.1', 8080)

# Poll new state from client
state = client.get_state()

# Get prediction from agent, execute
action = agent.act(state)
reward = client.execute(action)

# Add experience, agent automatically updates model according to batch size
agent.observe(reward=reward, terminal=False)

Benchmarks

We provide a seperate repository for benchmarking our algorithm implementations at reinforceio/tensorforce-benchmark.

Docker containers for benchmarking (CPU and GPU) are available.

This is a sample output for CartPole-v0, comparing VPG, TRPO and PPO:

Please refer to the tensorforce-benchmark repository for more information.

Community and contributions

TensorForce is developed by reinforce.io, a new project focused on providing reinforcement learning software infrastructure. For any questions, get in touch at [email protected].

Please file bug reports and feature discussions as GitHub issues in first instance.

There is also a developer chat you are welcome to join. For joining, we ask to provide some basic details how you are using TensorForce so we can learn more about applications and our community. Please fill in this short form which will take you to the chat after.

Cite

If you use TensorForce in your academic research, we would be grateful if you could cite it as follows:

@misc{schaarschmidt2017tensorforce,
    author = {Schaarschmidt, Michael and Kuhnle, Alexander and Fricke, Kai},
    title = {TensorForce: A TensorFlow library for applied reinforcement learning},
    howpublished={Web page},
    url = {https://github.com/reinforceio/tensorforce},
    year = {2017}
}

We are also very grateful for our open source contributors (listed according to github): Islandman93, wassname, Mazecreator, lefnire, sven1977, trickmeyer, mryellow, ImpulseAdventure, vwxyzjn, beflix, tms1337, BorisSchaeling, ngoodger, ekerazha, Davidnet, nikoliazekter, AdamStelmaszczyk, 10nagachika, petrbel, Kismuz.

Name		Name	Last commit message	Last commit date
Latest commit History 1,522 Commits
docs		docs
examples		examples
tensorforce		tensorforce
.gitignore		.gitignore
.travis.yml		.travis.yml
BUILD		BUILD
CONTRIBUTING.md		CONTRIBUTING.md
FAQ.md		FAQ.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
UPDATE_NOTES.md		UPDATE_NOTES.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorForce: A TensorFlow library for applied reinforcement learning

Introduction

Features

Installation

Examples and documentation

Create and use agents

Benchmarks

Community and contributions

Cite

About

Releases

Packages

Languages

License

ducandu/tensorforce

Folders and files

Latest commit

History

Repository files navigation

TensorForce: A TensorFlow library for applied reinforcement learning

Introduction

Features

Installation

Examples and documentation

Create and use agents

Benchmarks

Community and contributions

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages