Sciweavers

ICML
1994
IEEE

Learning Without State-Estimation in Partially Observable Markovian Decision Processes

14 years 4 months ago
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many realworld decision tasks, however, are inherently non-Markovian, i.e., the state of the environmentis onlyincompletelyknownto the learning agent. In this paper we consider only partially observable MDPs (POMDPs), a usefulclass of non-Markoviandecision processes. Most previous approaches to such problems have combined computationally expensive state-estimation techniques with learning control. This paper investigates learning in POMDPs without resorting to any form of state estimation. We present results about what TD(0) and Q-learning will do when applied to POMDPs. It is shown that the conventional discounted RL framework is inadequate to deal with POMDPs. Finally we develop a new framewor...
Satinder P. Singh, Tommi Jaakkola, Michael I. Jord
Added 27 Aug 2010
Updated 27 Aug 2010
Type Conference
Year 1994
Where ICML
Authors Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan
Comments (0)