Delay and Cooperation in Nonstochastic Bandits.

Authors :: Cesa-Bianchi, Nicolò
Gentile, Claudio
Mansour, Yishay
Source :: Journal of Machine Learning Research. 2019, Vol. 20 Issue 2-29, p1-38. 38p.
Publication Year :: 2019
Abstract: We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than d hops to arrive, where d is a delay parameter. We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order √ (d + 1 + K/N α≤ (T lnK), where α≤d is the independence number of the d-th power of the communication graph G. We then show that for any connected graph, for d = √ K the regret bound is K1=4 √ T, strictly better than the minimax regret √ KT for noncooperating agents. More informed choices of d lead to bounds which are arbitrarily close to the full information minimax regret √T lnK when G is dense. When G has sparse components, we show that a variant of Exp3-Coop, allowing agents to choose their parameters according to their centrality in G, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay. [ABSTRACT FROM AUTHOR]