This project contains random MDP experiments comparing true online TD (TOTD) by van Seijen and Sutton (2014) with TD with accumulating traces (TD) and TD with replacing traces (TDR). These experiments are done as a part of a forthcoming work by van Seijen, Sutton, Mahmood, Pilarski and Machado (2015).
It can be imported as an Eclipse Pydev project.
Read or execute runtotd-rndmdp-experiments.sh
for an example of running the experiments and plotting the python figures.
van Seijen, H., Sutton, R.S. (2014). True online TD(lambda). In Proceedings of the 31st International Conference on Machine Learning. JMLR W&CP 32(1):692-700.
van Seijen, H., Sutton, R.S., Mahmood, A.R., Pilarski, P.M., Machado, M.C. (2015). An empirical evaluation of true-online TD(lambda). (forthcoming)