Back to Search
Start Over
Delay and Cooperation in Nonstochastic Bandits.
- Source :
-
Journal of Machine Learning Research . 2019, Vol. 20 Issue 2-29, p1-38. 38p. - Publication Year :
- 2019
-
Abstract
- We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than d hops to arrive, where d is a delay parameter. We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order √ (d + 1 + K/N α≤ (T lnK), where α≤d is the independence number of the d-th power of the communication graph G. We then show that for any connected graph, for d = √ K the regret bound is K1=4 √ T, strictly better than the minimax regret √ KT for noncooperating agents. More informed choices of d lead to bounds which are arbitrarily close to the full information minimax regret √T lnK when G is dense. When G has sparse components, we show that a variant of Exp3-Coop, allowing agents to choose their parameters according to their centrality in G, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 15324435
- Volume :
- 20
- Issue :
- 2-29
- Database :
- Academic Search Index
- Journal :
- Journal of Machine Learning Research
- Publication Type :
- Academic Journal
- Accession number :
- 134764454