Sciweavers

ECML
2005
Springer

Active Learning in Partially Observable Markov Decision Processes

14 years 5 months ago
Active Learning in Partially Observable Markov Decision Processes
This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. We propose two formulations of the problem. The first formulation relies on a model of the uncertainty that is added directly into the POMDP planning problem. This has some interesting theoretical properties, but is impractical when many of the parameters are uncertain. Our second approach, called MEDUSA, is an instance of active learning, whereby we incrementally improve the POMDP model using selected queries, while still optimizing reward. Results show a good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.
Robin Jaulmes, Joelle Pineau, Doina Precup
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ECML
Authors Robin Jaulmes, Joelle Pineau, Doina Precup
Comments (0)