We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual...
Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains on...
Abstract. When a robot learns to solve a goal-directed navigation task with reinforcement learning, the acquired strategy can usually exclusively be applied to the task that has be...
Due to the unavoidable fact that a robot’s sensors will be limited in some manner, it is entirely possible that it can find itself unable to distinguish between differing state...