Back to Search Start Over

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs.

Authors :
Lin, Yifan
Wang, Yuhao
Zhou, Enlu
Source :
Journal of Systems Science & Systems Engineering; Jun2023, Vol. 32 Issue 3, p267-288, 22p
Publication Year :
2023

Abstract

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For T rounds, K actions, and d-dimensional feature vectors, we prove a regret bound of O ((1 + ρ + 1 ρ) d ln T ln K δ d K T 1 + 2 ϵ ln K δ 1 ϵ ) that holds with probability 1 − δ under the mean-variance criterion with risk tolerance ρ, for any 0 < ϵ < 1 2 , 0 < δ < 1 . The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10043756
Volume :
32
Issue :
3
Database :
Complementary Index
Journal :
Journal of Systems Science & Systems Engineering
Publication Type :
Academic Journal
Accession number :
163885701
Full Text :
https://doi.org/10.1007/s11518-022-5541-9