Start Over

Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee

Authors :: Liu, Junyan
Li, Yunfan
Yang, Lin
Publication Year :: 2024
Abstract: Existing performance measures for bandit algorithms such as regret, PAC bounds, or uniform-PAC (Dann et al., 2017), typically evaluate the cumulative performance, while allowing the play of an arbitrarily bad arm at any finite time t. Such a behavior can be highly detrimental in high-stakes applications. This paper introduces a stronger performance measure, the uniform last-iterate (ULI) guarantee, capturing both cumulative and instantaneous performance of bandit algorithms. Specifically, ULI characterizes the instantaneous performance since it ensures that the per-round regret of the played arm is bounded by a function, monotonically decreasing w.r.t. (large) round t, preventing revisits to bad arms when sufficient samples are available. We demonstrate that a near-optimal ULI guarantee directly implies near-optimal cumulative performance across aforementioned performance measures. To examine the achievability of ULI in the finite arm setting, we first provide two positive results that some elimination-based algorithms and high-probability adversarial algorithms with stronger analysis or additional designs, can attain near-optimal ULI guarantees. Then, we also provide a negative result, indicating that optimistic algorithms cannot achieve a near-optimal ULI guarantee. Finally, we propose an efficient algorithm for linear bandits with infinitely many arms, which achieves the ULI guarantee, given access to an optimization oracle.

Subjects :: Computer Science - Machine Learning
Statistics - Machine Learning

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2402.12711
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources