— We introduce the Oracular Partially Observable Markov Decision Process (OPOMDP), a type of POMDP in which the world produces no observations; instead there is an “oracle,” available in any state, that tells the agent its exact state for a fixed cost. The oracle may be a human or a highly accurate sensor. At each timestep the agent must choose whether to take a domain-level action or consult the oracle. This formulation comprises a factorization between information-gathering actions and domain-level actions, allowing us to characterize the value of information and to examine the problem of planning under uncertainty from a novel perspective. We propose an algorithm to capitalize on this factorization and the special structure of the OPOMDP, and we test the algorithm’s performance on a new sample domain. On this new domain, we are able to solve a problem with hundreds of thousands of action-states and vastly outperform a previous state-of-the-art approximate technique.
Nicholas Armstrong-Crews, Manuela M. Veloso