Approximate policy iteration using regularised Bellman residuals minimisation.

Authors :: Esposito, G.
Martin, M.
Source :: Journal of Experimental & Theoretical Artificial Intelligence. Feb-Apr2016, Vol. 28 Issue 1/2, p351-367. 17p.
Publication Year :: 2016
Abstract: In this paper we present an approximate policy iteration (API) method called API‐BRMϵ using a very effective implementation of incremental support vector regression (SVR) to approximate the value function able to generalise in continuous (or large) space reinforcement learning (RL) problems. RL is a methodology able to solve complex and uncertain decision problems usually modelled as Markov decision problems. API-BRMϵ is formalised as a non-parametric regularisation problem based on an outcome of the Bellman residual minimisation (BRM) which is able to minimise the variance of the problem. API-BRMϵ is incremental and can be applied to RL using the on-line agent interaction framework. Based on non-parametric SVR API-BRMϵ is able to find the global solution of the problem with convergence guarantees to the optimal solution. A value function should be defined to find the optimal policy specifying the total reward that an agent might expect in its current state taking one action. Therefore, the agent will use the value function to choose the action to take. Some experimental evidence and performance for well-known RL benchmarks are presented. [ABSTRACT FROM AUTHOR]