Back to Search Start Over

Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee

Authors :
Liu, Junyan
Li, Yunfan
Yang, Lin
Publication Year :
2024

Abstract

Existing performance measures for bandit algorithms such as regret, PAC bounds, or uniform-PAC (Dann et al., 2017), typically evaluate the cumulative performance, while allowing the play of an arbitrarily bad arm at any finite time t. Such a behavior can be highly detrimental in high-stakes applications. This paper introduces a stronger performance measure, the uniform last-iterate (ULI) guarantee, capturing both cumulative and instantaneous performance of bandit algorithms. Specifically, ULI characterizes the instantaneous performance since it ensures that the per-round regret of the played arm is bounded by a function, monotonically decreasing w.r.t. (large) round t, preventing revisits to bad arms when sufficient samples are available. We demonstrate that a near-optimal ULI guarantee directly implies near-optimal cumulative performance across aforementioned performance measures. To examine the achievability of ULI in the finite arm setting, we first provide two positive results that some elimination-based algorithms and high-probability adversarial algorithms with stronger analysis or additional designs, can attain near-optimal ULI guarantees. Then, we also provide a negative result, indicating that optimistic algorithms cannot achieve a near-optimal ULI guarantee. Finally, we propose an efficient algorithm for linear bandits with infinitely many arms, which achieves the ULI guarantee, given access to an optimization oracle.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2402.12711
Document Type :
Working Paper