1. Sample Complexity of Variance-reduced Distributionally Robust Q-learning
- Author
-
Wang, Shengbo, Si, Nian, Blanchet, Jose, and Zhou, Zhengyuan
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Optimization and Control (math.OC) ,Statistics - Machine Learning ,FOS: Mathematics ,Machine Learning (stat.ML) ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
Dynamic decision making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment on which the data is collected can differ from that of the environment on which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts. These algorithms are designed to efficiently approximate the $q$-function of an infinite-horizon $\gamma$-discounted robust Markov decision process with Kullback-Leibler uncertainty set to an entry-wise $\epsilon$-degree of precision. Further, the variance-reduced distributionally robust Q-learning combines the synchronous Q-learning with variance-reduction techniques to enhance its performance. Consequently, we establish that it attains a minmax sample complexity upper bound of $\tilde O(|S||A|(1-\gamma)^{-4}\epsilon^{-2})$, where $S$ and $A$ denote the state and action spaces. This is the first complexity result that is independent of the uncertainty size $\delta$, thereby providing new complexity theoretic insights. Additionally, a series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts.
- Published
- 2023