Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for nding optimal behavior do not appear to scale well and have been unable to nd satisfactory policies for problems with more than a dozen states. After a brief review of pomdp's, this paper discusses several simple solution methods and shows that all are capable of nding nearoptimal policies for a selection of extremely smallpomdp's taken fromthe learningliterature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We nd that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains.
Michael L. Littman, Anthony R. Cassandra, Leslie P