Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function.

Authors :: Tamatsukuri A
Takahashi T
Source :: Bio Systems [Biosystems] 2019 Jun; Vol. 180, pp. 46-53. Date of Electronic Publication: 2019 Feb 27.
Publication Year :: 2019
Abstract: As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a satisficing strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing (RS) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the K-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that RS is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of RS is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of RS with that of other representative algorithms for the K-armed bandit problems.<br /> (Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.)

Subjects :: Choice Behavior physiology
Humans
Machine Learning
Models, Psychological
Reinforcement, Psychology
Algorithms
Cognition physiology
Decision Making physiology
Models, Theoretical

Full Text Access

Tools