Back to Search Start Over

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Authors :
Min, Yifei
Wang, Tianhao
Zhou, Dongruo
Gu, Quanquan
Publication Year :
2021

Abstract

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy. We propose to incorporate the variance information of the value function to improve the sample efficiency of OPE. More specifically, for time-inhomogeneous episodic linear Markov decision processes (MDPs), we propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration. We show that our algorithm achieves a tighter error bound than the best-known result. We also provide a fine-grained characterization of the distribution shift between the behavior policy and the target policy. Extensive numerical experiments corroborate our theory.<br />Comment: 59 pages, 4 figures. In NeurIPS 2021

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2106.11960
Document Type :
Working Paper