This file tracks all major updates and new features. As TensorForce is still in alpha, we are continuously implementing small updates and bug fixes, which will not be tracked here in detail but through github issues.
21st April
- Introduced atomic_observe operation and buffered flag in act() to avoid race conditions happening in multi-threaded inserts into memory.
31st March
- Fixed some shape/slicing errors in prioritized replay -> Test now passing
24th March
- Fixed buffer overflows in the priority replay buffer, which now simply resets the buffer insertion index. Note that for effective use of prioritized replay, the batch size and frequency of updates should be sized so the buffer is emptied timely. All experiences are first written to the buffer, then once their priority has been determined written to the main memory. Sampling works by first taking new elements from the buffer, then if necessary remaining elements from the replay memory according to their priorities. This means if updates are rare and batch sizes small in comparison to the buffer size, updates will always be from recent memories.
- There is a remaining off-by-one sampling error in prioritized replay which makes it unsafe to use as of yet.
- Removed non-tensorflow old prioritized replay class.
17th March
- Merged PR to allow component loading. This now enables loading of pretrained networks without having to load a matching graph, thus enabling supervised pretraining. See FAQ for examples.
- Renamed
distributed
argument toexecution
in agent which will handle all execution settings going forward.
10th March
- Added Prototype of prioritised replay in pure TensorFlow, still under construction and not usable.
- Recently fixed bug which occasionally returned empty batches from replay memory due to masking out terminal entries.
20th February
- Merge of memory branch with major library-internal changes: core.memories module, memory_model, adapted handling of batching/losses/optimization, etc.
- Updated and standardized agent parameters, see documentation of agent classes.
14th January
- Reverted back deprecated API call to be compatible with 1.4.1 in version 0.3.6.1
12th January
- Implemented some hot-fixes following changes in TensorFlow regarding variable registration. These changes (first observed in 1.4) caused our custom getters for tf.make_template to register variables differently, thus sometimes causing double registration in our variable lists. The latest pip version 0.3.5 combined with TensorFlow 1.5.0rc0 address these issues.
6th January
- In December, a number of bugs regarding exploration and a numberical issue in generalised advantage estimation were fixed which seem to increase performance so an update is recommended.
- Agent structure saw major refactoring to remove redundant code, introduced a
LearningAgent
to hold common fields and distinguish from non-learning agents (e.g.RandomAgent
) - We are preparing to move memories into the TensorFlow graph which will fix sequences and allow subsampling in the optimizers. Further, new episode/batch semantics will be enabled (e.g. episode based instead of timestep based batching).
9th December 2017
- Renamed LSTM to InternalLSTM and created a new LSTM layer which implements more standard
sequence functionality. The
internal_lstm
is used for internal agent state, whilelstm
may be used for seq2seq problems.
2nd December 2017
- Sequence preprocessor temporarily broken; use version 0.3.2 if required. This is because sequence sampling in TensorFlow is only sensibly possible once replay memories/batches have also been moved into TensorFlow.
- Moved pre-processing and exploration from agent (in Python logic) to TensorFlow control flow in model
11th November 2017
- BREAKING: We removed the Configuration object. Most users feel named arguments are far more
comfortable to handle. Agents are now created specifying all non-default paremeters explicitly,
see quickstart examples. - Agents are now specified as part of the configuration via a 'type', e.g. "type" : "dqn_agent"
8th November 2017
- Layers/networks/etc now take an additional argument
update
intf_apply
, a boolean tensor indicating whether the call happens during an update.
7th November 2017
- New saver/summary/distributed config interface via entries
saver_spec
,summary_spec
,distributed_spec
. - The first two require at least a
directory
value. - Automatically periodically saves model/summaries with
seconds
in respective_spec
set.
22nd October 2017
- BREAKING: We released a complete redesign including our new optimization module. Optimizers which previously were only available in Python (natural gradients) are now available in pure TensorFlow. A blogpost on this will appear soon.
- Agent configurations are now decomposed in (
action_spec
,states_spec
,network_spec
, and config). This facilitates a more clear separation between hyperparameters of the model and describing the problem. - Models are now heavily making use of templated graph construction.
- Policy gradient models have been decomposed in models using likelihood ratios and log
likelihood (
pg_prob_ratio_model
) and (pg_log_prob_model
) - Q-models are now implemented as distributional models, which enables the use of natural gradients in Q-models. A blogpost on the practical implications is also on the way.
- Baselines: It is now possible to share parameters between main networks and baselines via
the baseline option (
NetworkBaseline
). - Actions now support boolean types.
2nd September 2017
- Added multi-LSTM support
- Fixed various bugs around reporting and logging
- Introduced CNN baseline
- Added baseline support for multiple states (experimental). Every state gets its own baseline and predictions are averaged
13th August 2017
- Fixed PPO performance issues, which we now recommend as the default
- Implemented Beta distribution for bounded actions
- Added n-step DQN and multithreaded runner
- Fixed wrong internal calculation of
prob_ratio
andkl_divergence
in TRPO/PPO - Added
next_internals
functionality to memories and QModel - Changed config value names related to advantage estimation to
gae_rewards
andnormalize_rewards
3rd August 2017
- Added
ls_accept_ratio=0.01
and adapted names of other TRPO config parameters related to line search - Various bugs in Categorical DQN and Q-model target network scope fixed by @Islandman93
- Refactored distributions, categorical now using Gumbel-softmax
29th July 2017
- Added
QModel
as base class for DQN (hence DQFD) and NAF - Added
next_state
placeholder toQModel
, and boolean flag toMemory.get_batch
to include next states Configuration
now keeps track of which values were accessed, andAgent
reports warning if not all were accessed
28th July 2017
- Moved external environments to tensorforce/contrib. The environment module just contains the base environment class and our test environment going forward
- Merged environments ALE and Maze explorer, thanks to Islandman93 and mryellow
25th July 2017
- New optional argument
shape
for action specification, if an array of actions sharing the same specification is required - Complete and correct mapping of OpenAIGym state/action spaces to corresponding TensorForce state/action specifications
MinimalTest
environment extension for multiple actions, plus an additional multi-state/action test for each agent
23th July 2017
- Implemented prototype of Proximal Policy Optimisation (PPO)
- Configuration argument network can now take module paths, not just functions
- Fixed prioritized experience replay sampling bug
- Enabling default values for distributions, see tensorforce#34
8th July 2017
- BREAKING CHANGE: We modified the act and observe API once more because we think there was a lack of clarity with regard to which state is observed (current vs next). The agent now internally manages states and actions in the correct sequence so observe only needs reward and terminal.
- We further introduced a method
import_observations
so memory-based agents can preload data into memory (e.g. if historic data is available). We also added a methodlast_observation
on the generic agent which gives the current state, action, reward, terminal and internal state - Fixed distributed agent mode, should run as intended now
- Fixed target network usage in NAF. Tests now run smoothl
- DQFDAgent now inherits from MemoryAgent
2nd July 2017
- Fixed lab integration: updated bazel BUILD file with command line options
- Adjusted environment integration to correctly select state and action interfaces
- Changed default agent to VPG since lab mixes continuous and discrete actions
25h June 2017
- Added prioritised experience replay
- Added RandomAgent for discrete/continuous random baselines
- Moved pre-processing from runner to agent, analogue to exploration
11th June 2017
- Fixed bug in DQFD test where demo data was not always the correct action. Also fixed small bug in DQFD loss (mean over supervised loss)
- Network entry added to configuration so no separate network builder has to be passed to the agent constructor (see example)
- The async mode using distributed tensorflow has been merged into the main model class. See the openai_gym_async.py example. In particular, this means multiple agents are now available in async mode. N.b. we are still working on making async/distributed things more convenient to use.
- Fixed bug in NAF where target value (V) was connected to training output. Also added gradient clipping to NAF because we observed occasional numerical instability in testing.
- For the same reason, we have altered the tests to always run multiple times and allow for an occasional failure on travis so our builds don't get broken by a random initialisation leading to an under/overflow.
- Updated OpenAI Universe integration to work with our state/action interface, see an example in examples/openai_universe.py
- Added convenience method to create Network directly from json without needing to create a network builder, see examples for usage