1. Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
- Author
-
Koulouriotis, D.E. and Xanthopoulos, A.
- Subjects
- *
COMBINATORIAL optimization , *GAUSSIAN distribution , *ALGORITHMS , *FINITE element method - Abstract
Abstract: Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation–exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods’ performances. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF