Back to Search Start Over

Approximate policy iteration using regularised Bellman residuals minimisation.

Authors :
Esposito, G.
Martin, M.
Source :
Journal of Experimental & Theoretical Artificial Intelligence. Feb-Apr2016, Vol. 28 Issue 1/2, p351-367. 17p.
Publication Year :
2016

Abstract

In this paper we present an approximate policy iteration (API) method called API‐BRMϵ using a very effective implementation of incremental support vector regression (SVR) to approximate the value function able to generalise in continuous (or large) space reinforcement learning (RL) problems. RL is a methodology able to solve complex and uncertain decision problems usually modelled as Markov decision problems. API-BRMϵ is formalised as a non-parametric regularisation problem based on an outcome of the Bellman residual minimisation (BRM) which is able to minimise the variance of the problem. API-BRMϵ is incremental and can be applied to RL using the on-line agent interaction framework. Based on non-parametric SVR API-BRMϵ is able to find the global solution of the problem with convergence guarantees to the optimal solution. A value function should be defined to find the optimal policy specifying the total reward that an agent might expect in its current state taking one action. Therefore, the agent will use the value function to choose the action to take. Some experimental evidence and performance for well-known RL benchmarks are presented. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0952813X
Volume :
28
Issue :
1/2
Database :
Academic Search Index
Journal :
Journal of Experimental & Theoretical Artificial Intelligence
Publication Type :
Academic Journal
Accession number :
113739746
Full Text :
https://doi.org/10.1080/0952813X.2015.1024494