Back to Search Start Over

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function.

Authors :
Tamatsukuri A
Takahashi T
Source :
Bio Systems [Biosystems] 2019 Jun; Vol. 180, pp. 46-53. Date of Electronic Publication: 2019 Feb 27.
Publication Year :
2019

Abstract

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a satisficing strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing (RS) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the K-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that RS is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of RS is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of RS with that of other representative algorithms for the K-armed bandit problems.<br /> (Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.)

Details

Language :
English
ISSN :
1872-8324
Volume :
180
Database :
MEDLINE
Journal :
Bio Systems
Publication Type :
Academic Journal
Accession number :
30822443
Full Text :
https://doi.org/10.1016/j.biosystems.2019.02.009