Learning Policies for Partially Observable Environments: Scaling Up

16 years 2 months ago

Download reference.kfupm.edu.sa

Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for nding optimal behavior do not appear to scale well and have been unable to nd satisfactory policies for problems with more than a dozen states. After a brief review of pomdp's, this paper discusses several simple solution methods and shows that all are capable of nding nearoptimal policies for a selection of extremely smallpomdp's taken fromthe learningliterature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We nd that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains.

Michael L. Littman, Anthony R. Cassandra, Leslie P

Real-time Traffic

ICML 1995 | Machine Learning | Model Decision Problems | Noisy Sensor Feedback | Observable Markov Decision |

claim paper

» A Predictive Model for Imitation Learning in Partially Observable Environments

» Not all agents are equal scaling up distributed POMDPs for agent networks

» SarsaLandmark an algorithm for learning in POMDPs with landmarks

» Learning to Cooperate via Policy Search

» Learning from Scarce Experience

» Doubly Robust Policy Evaluation and Learning

» Improving Gradient Estimation by Incorporating Sensor Data

» ModelBased Online Learning of POMDPs

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	1995
Where	ICML
Authors	Michael L. Littman, Anthony R. Cassandra, Leslie Pack Kaelbling

Comments (0)

Sciweavers

Learning Policies for Partially Observable Environments: Scaling Up

ICML 1995 | Machine Learning | Model Decision Problems | Noisy Sensor Feedback | Observable Markov Decision |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers