Multi-agent bandit with agent-dependent expected rewards.

Authors :: Jiang, Fan
Cheng, Hui
Source :: Swarm Intelligence; Sep2023, Vol. 17 Issue 3, p219-251, 33p
Publication Year :: 2023
Abstract: Many studies on the exploration policies for stochastic multi-agent bandit (MAB) problems demonstrate that integrating the experience of other group members accelerates the learning of optimal actions. However, the basic assumption of the classical MAB problem that the expected rewards are agent-independent is invalid in many real-world problems. The group members have different expected rewards for the possible actions, perhaps due to the different initial states or local environments. To solve the MAB problem with agent-dependent expected rewards, we develop a decentralized exploration policy in which agents apply confidence-weighting to integrate the experience of other group members and to estimate the expected rewards. Theoretical analysis demonstrates that the acceleration of learning still works in the agent-dependent case, and numerical simulation results verify that the proposed exploration policy outperforms the state-of-the-art method. [ABSTRACT FROM AUTHOR]