Abstract— We propose a new approximate algorithm, LAJIV (Lookahead J-MDP Information Value), to solve Oracular Partially Observable Markov Decision Problems (OPOMDPs), a special type of POMDP that rather than standard observations includes an “oracle” that can be consulted for full state information at a fixed cost. We previously introduced JIV (J-MDP Information Value) to solve OPOMDPs, an heuristic algorithm that utilizes the solution of the underlying MDP and weighs the value of consulting the oracle against the value of taking a state-modifying action. While efficient, JIV will rarely find the optimal solution. In this paper, we extend JIV to include lookahead, thereby permitting arbitrarily small deviation from the optimal policy’s long-term expected reward at the cost of added computation time. The depth of the lookahead is a parameter that governs this tradeoff; by iteratively increasing this depth, we provide an anytime algorithm that yields an everimproving solution...
Nicholas Armstrong-Crews, Manuela M. Veloso