Back to Search
Start Over
Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
- Source :
- Information Sciences. 580:311-330
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- The problem of off-policy evaluation (OPE) has long been advocated as one of the foremost challenges in reinforcement learning. Gradient-based and emphasis-based temporal-difference (TD) learning algorithms comprise the major part of off-policy TD learning methods. In this work, we investigate the derivation of efficient OPE algorithms from a novel perspective based on the advantages of these two categories. The gradient-based framework is adopted, and the emphatic approach is used to improve convergence performance. We begin by proposing a new analogue of the on-policy objective, called the distribution-correction-based mean square projected Bellman error (DC-MSPBE). The key to the construction of DC-MSPBE is the use of emphatic weightings on the representable subspace of the original MSPBE. Based on this objective function, the emphatic TD with lower-variance gradient correction (ETD-LVC) algorithm is proposed. Under standard off-policy and stochastic approximation conditions, we provide the convergence analysis of the ETD-LVC algorithm in the case of linear function approximation. Further, we generalize the algorithm to nonlinear smooth function approximation. Finally, we empirically demonstrate the improved performance of our ETD-LVC algorithm on off-policy benchmarks. Taken together, we hope that our work can guide the future discovery of a better alternative in the off-policy TD learning algorithm family.
- Subjects :
- Mathematical optimization
Information Systems and Management
Computer science
Stochastic approximation
Linear function
Computer Science Applications
Theoretical Computer Science
Nonlinear system
Artificial Intelligence
Control and Systems Engineering
Convergence (routing)
Key (cryptography)
Reinforcement learning
Temporal difference learning
Software
Subspace topology
Subjects
Details
- ISSN :
- 00200255
- Volume :
- 580
- Database :
- OpenAIRE
- Journal :
- Information Sciences
- Accession number :
- edsair.doi...........d52d0d261b3c9ddfa1d03bc02e5b785d