Thompson Sampling for Stochastic Control: The Finite Parameter Case.

Authors :: Kim, Michael Jong
Source :: IEEE Transactions on Automatic Control. Dec2017, Vol. 62 Issue 12, p6415-6422. 8p.
Publication Year :: 2017
Abstract: In this paper, we apply Thompson sampling to a class of average reward stochastic control problems with parameter uncertainty. Specifically, we study an average reward stochastic control problem over an infinite horizon in which both the reward and state transition distributions are parameterized by an unknown parameter taking values in a finite space. The main result of this paper is a proof showing that Thompson sampling achieves a worst case average per period regret of O(T^-1), which is asymptotically optimal. [ABSTRACT FROM PUBLISHER]