tion Learning about Temporally Abstract Actions Richard S. Sutton Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 rich@cs.umass.edu Doina Precup Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 dprecup@cs.umass.edu Satinder Singh Department of Computer Science University of Colorado Boulder, CO 80309-0430 baveja@cs.colorado.edu Several researchers have proposed modeling ly abstract actions in reinforcement learning by the combination of a policy and a termination condition, which we refer to as an option. Value functions over options and models of options can be learned using methods designed for semi-Markov decision processes (SMDPs). However, all these methods require an option to be executed to termination. In this paper we explore methods that learn about an option from small fragments of experience consistent with that option, even if the option itself is not executed. We call these methods intra-option learning m...
Richard S. Sutton, Doina Precup, Satinder P. Singh