Due to the unavoidable fact that a robot’s sensors will be limited in some manner, it is entirely possible that it can find itself unable to distinguish between differing states of the world (the world is in effect partially observable). If reinforcement learning is used to train the robot, then this confounding of states can have a serious effect on its ability to learn optimal and stable policies. Good results have been achieved by enhancing reinforcement learning algorithms through the addition of memory or the use of internal models. In our work we take a different approach and consider whether active perception could be used instead. We test this using omniscient oracles, who play the role of a robot’s active perceptual system, in a simple grid world navigation problem. Our results indicate that simple reinforcement learning algorithms can learn when to consult these oracles, and as a result learn optimal policies.
Paul A. Crook, Gillian Hayes