Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we wa...
We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on ac...
Kshitij Judah, Saikat Roy, Alan Fern, Thomas G. Di...
Relativized options combine model minimization methods and a hierarchical reinforcement learning framework to derive compact reduced representations of a related family of tasks. ...
The application of Reinforcement Learning (RL) algorithms to learn tasks for robots is often limited by the large dimension of the state space, which may make prohibitive its appli...
Andrea Bonarini, Alessandro Lazaric, Marcello Rest...