Back to Search Start Over

Recursive Least-Squares Temporal Difference With Gradient Correction.

Authors :
Song, Tianheng
Li, Dazi
Yang, Weimin
Hirasawa, Kotaro
Source :
IEEE Transactions on Cybernetics; Aug2021, Vol. 51 Issue 8, p4251-4264, 14p
Publication Year :
2021

Abstract

Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence in off-policy learning, has inspired the studies of a large number of novel TD-based approaches. Gradient-based and least-squares-based algorithms comprise the major part of these new approaches. This paper aims to combine advantages of these two categories to derive an efficient policy evaluation algorithm with ${O}$ (${n} ^{ {2}}$) per-time-step runtime complexity. The least-squares-based framework is adopted, and the gradient correction is used to improve convergence performance. This paper begins with the revision of a previous ${O}$ (${n} ^{ {3}}$) batch algorithm, least-squares TD with a gradient correction (LS-TDC) to regularize the parameter vector. Based on the recursive least-squares technique, an ${O}$ (${n} ^{ {2}}$) counterpart of LS-TDC called RC is proposed. To increase data efficiency, we generalize RC with eligibility traces. An off-policy extension is also proposed based on importance sampling. In addition, the convergence analysis for RC as well as LS-TDC is given. The empirical results in both on-policy and off-policy benchmarks show that RC has a higher estimation accuracy than that of RLSTD and a significantly lower runtime complexity than that of LSTDC. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
21682267
Volume :
51
Issue :
8
Database :
Complementary Index
Journal :
IEEE Transactions on Cybernetics
Publication Type :
Academic Journal
Accession number :
153127825
Full Text :
https://doi.org/10.1109/TCYB.2019.2902342