Back to Search Start Over

Partially observable Markov decision processes with reward information: basic ideas and models

Authors :
Cao, Xi-Ren
Guo, Xianping
Source :
IEEE Transactions on Automatic Control. April, 2007, Vol. 52 Issue 4, p677, 5 p.
Publication Year :
2007

Abstract

In a partially observable Markov decision process (POMDP), if the reward can be observed at each step, then the observed reward history contains information on the unknown state. This information, in addition to the information contained in the observation history, can be used to update the state probability distribution. The policy thus obtained is called a reward-information policy (RI-policy); an optimal RI-policy performs no worse than any normal optimal policy depending only on the observation history. The above observation leads to four different problem-formulations for POMDPs depending on whether the reward function is known and whether the reward at each step is observable. This exploratory work may attract attention to these interesting problems. Index Terms--Partially observable Markov decision process (POMDP), reward-information policy. Digital Object Identifier 10.1109/TAC.2007.894520

Subjects

Subjects :
Markov processes -- Analysis

Details

Language :
English
ISSN :
00189286
Volume :
52
Issue :
4
Database :
Gale General OneFile
Journal :
IEEE Transactions on Automatic Control
Publication Type :
Academic Journal
Accession number :
edsgcl.163064003