Partially observable Markov decision processes and performance sensitivity analysis

Authors :: Li, Yanjie
Yin, Baoqun
Xi, Hongsheng
Source :: IEEE Transactions on Systems, Man, and Cybernetics--Part B: Cybernetics. Dec, 2008, Vol. 38 Issue 6, p1645, 7 p.
Publication Year :: 2008
Abstract: The sensitivity-based optimization of Markov systems has become an increasingly important area. From the perspective of performance sensitivity analysis, policy-iteration algorithms and gradient estimation methods can be directly obtained for Markov decision processes (MDPs). In this correspondence, the sensitivity-based optimization is extended to average reward partially observable MDPs (POMDPs). We derive the performance-difference and performance-derivative formulas of POMDPs. On the basis of the performance-derivative formula, we present a new method to estimate the performance gradients. From the performance-difference formula, we obtain a sufficient optimality condition without the discounted reward formulation. We also propose a policy-iteration algorithm to obtain a nearly optimal finite-state-controller policy. Index Terms--Finite-state controller (FSC), gradient estimation, partially observable Markov decision processes (POMDPs), policy iteration, sensitivity analysis.

Language :: English
ISSN :: 10834419
Volume :: 38
Issue :: 6
Database :: Gale General OneFile
Journal :: IEEE Transactions on Systems, Man, and Cybernetics--Part B: Cybernetics
Publication Type :: Academic Journal
Accession number :: edsgcl.190149441

Tools