Increasing user comfort and reducing operation costs have always been two primary objectives of building operations and control strategies. Current building control strategies are...
Vipul Singhvi, Andreas Krause, Carlos Guestrin, Ja...
Intelligent planning algorithms such as the Partially Observable Markov Decision Process (POMDP) have succeeded in dialog management applications [10, 11, 12] because of their rob...
Abstract. The automata-based model checking approach for randomized distributed systems relies on an operational interleaving semantics of the system by means of a Markov decision ...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
tion Learning about Temporally Abstract Actions Richard S. Sutton Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 rich@cs.umass.edu Doina Precup D...
Richard S. Sutton, Doina Precup, Satinder P. Singh