1. Opportunistic Approachability and Generalized No-Regret Problems
- Author
-
Andrey Bernstein, Nahum Shimkin, and Shie Mannor
- Subjects
Convex hull ,Computer Science::Computer Science and Game Theory ,Mathematical optimization ,General Mathematics ,Stochastic game ,ComputingMilieux_PERSONALCOMPUTING ,Regret ,Management Science and Operations Research ,Decision problem ,Approachability ,Computer Science Applications ,Fictitious play ,Special case ,Set (psychology) ,Mathematical economics ,Mathematics - Abstract
Blackwell’s theory of approachability, introduced in 1956, has since proved a useful tool in the study of a range of repeated multiagent decision problems. Given a repeated matrix game with vector payoffs, a target set S is approachable by a certain player if he can ensure that the average payoff vector converges to that set, for any strategy of the opponent. In this paper we consider the case where a set need not be approachable in general, but may be approached if the opponent played favorably in some sense. In particular, we consider nonconvex sets that satisfy Blackwell’s dual condition, namely, can be approached when the opponent plays a stationary strategy. Whereas the convex hull of such a set is approachable, this is not generally the case for the original nonconvex set itself. We start by defining a sense of restricted play of the opponent (with stationary strategies being a special case), and then formulate appropriate goals for an opportunistic approachability algorithm that can take advantage of such restricted play as it unfolds during the game. We then consider a calibration-based approachability strategy that is opportunistic in that sense. A major motivation for this study comes from no-regret problems that lack a convex structure such as the problem of online learning with sample-path constraints, as formulated in Mannor et al. [Mannor S, Tsitsiklis JN, Yu JY (2009) Online learning with sample path constraints. J. Machine Learn. Res. 10:569–590]. Here the best-response-in-hindsight is not generally attainable, but only a convex relaxation thereof. Our proposed algorithm, while ensuring that relaxed goal, also comes closer to the nonrelaxed one when the opponent’s play is restricted in a well-defined sense.
- Published
- 2014
- Full Text
- View/download PDF