Back to Search
Start Over
Recursive Least-Squares Temporal Difference With Gradient Correction.
- Source :
- IEEE Transactions on Cybernetics; Aug2021, Vol. 51 Issue 8, p4251-4264, 14p
- Publication Year :
- 2021
-
Abstract
- Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence in off-policy learning, has inspired the studies of a large number of novel TD-based approaches. Gradient-based and least-squares-based algorithms comprise the major part of these new approaches. This paper aims to combine advantages of these two categories to derive an efficient policy evaluation algorithm with ${O}$ (${n} ^{ {2}}$) per-time-step runtime complexity. The least-squares-based framework is adopted, and the gradient correction is used to improve convergence performance. This paper begins with the revision of a previous ${O}$ (${n} ^{ {3}}$) batch algorithm, least-squares TD with a gradient correction (LS-TDC) to regularize the parameter vector. Based on the recursive least-squares technique, an ${O}$ (${n} ^{ {2}}$) counterpart of LS-TDC called RC is proposed. To increase data efficiency, we generalize RC with eligibility traces. An off-policy extension is also proposed based on importance sampling. In addition, the convergence analysis for RC as well as LS-TDC is given. The empirical results in both on-policy and off-policy benchmarks show that RC has a higher estimation accuracy than that of RLSTD and a significantly lower runtime complexity than that of LSTDC. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 21682267
- Volume :
- 51
- Issue :
- 8
- Database :
- Complementary Index
- Journal :
- IEEE Transactions on Cybernetics
- Publication Type :
- Academic Journal
- Accession number :
- 153127825
- Full Text :
- https://doi.org/10.1109/TCYB.2019.2902342