Back to Search Start Over

Meta-Thompson Sampling

Authors :
Kveton, Branislav
Konobeev, Mikhail
Zaheer, Manzil
Hsu, Chih-wei
Mladenov, Martin
Boutilier, Craig
Szepesvari, Csaba
Publication Year :
2021

Abstract

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.<br />Comment: Proceedings of the 38th International Conference on Machine Learning

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2102.06129
Document Type :
Working Paper