1. Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies.
- Author
-
Lenga, Jinsong, Fyfe, Colin, and Jain, Lakhmi C.
- Subjects
ALGORITHMS ,MATHEMATICAL optimization ,LINEAR programming ,SYSTEMS engineering ,MATHEMATICAL programming ,MACHINE learning - Abstract
Temporal difference learning and eligibility traces are two mechanisms for solving reinforcement learning problems. The temporal difference technique bootstraps the state value or state-action value at every step as with dynamic programming, and learns by sampling episodes from experience as in the Monte Carlo approach. Eligibility traces is a mechanism that offers a means for recording the degree of which state is eligible for undergoing learning process. This paper aims to investigate the underlying mechanism of eligibility traces strategies using on-policy and off-policy learning algorithms. In doing so, the performance metrics can be obtained by defining the learning problem in a simulation environment, in conjunction with different learning algorithms. However, measuring learning performance and analysing sensibility are very expensive because such performance metrics can only be obtained by running an experiment with different parameter values. This paper proposes a comparative study for analysing the mechanism of eligibility traces. The objective of this paper is to compare and investigate the influences on performance caused by those different approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF