In this paper, we address the task of mapping high-level instructions to sequences of commands in an external environment. Processing these instructions is challenging--they posit...
S. R. K. Branavan, Luke S. Zettlemoyer, Regina Bar...
PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way which guarantees that with high probability, the algorithm pe...
Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning ...
tion Learning about Temporally Abstract Actions Richard S. Sutton Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 rich@cs.umass.edu Doina Precup D...
Richard S. Sutton, Doina Precup, Satinder P. Singh
Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of "bootstrapped" return estimates to make effi...