Sciweavers

ECML
2005
Springer

Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

14 years 5 months ago
Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes
Partially Observable Markov Decision Processes (POMDP) provide a standard framework for sequential decision making in stochastic environments. In this setting, an agent takes actions and receives observations and rewards from the environment. Many POMDP solution methods are based on computing a belief state, which is a probability distribution over possible states in which the agent could be. The action choice of the agent is then based on the belief state. The belief state is computed based on a model of the environment, and the history of actions and observations seen by the agent. However, reward information is not taken into account in updating the belief state. In this paper, we argue that rewards can carry useful information that can help disambiguate the hidden state. We present a method for updating the belief state which takes rewards into account. We present experiments with exact and approximate planning methods on several standard POMDP domains, using this belief update met...
Masoumeh T. Izadi, Doina Precup
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ECML
Authors Masoumeh T. Izadi, Doina Precup
Comments (0)