In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point...
Shihao Ji, Ronald Parr, Hui Li, Xuejun Liao, Lawre...
We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual...
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision ...
Abstract--In this paper, the sum capacity of the Gaussian Multiple Input Multiple Output (MIMO) Cognitive Radio Channel (MCC) is expressed as a convex problem with finite number of...