Abstract— Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers...
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point...
Shihao Ji, Ronald Parr, Hui Li, Xuejun Liao, Lawre...
The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Temporal difference methods and Bellman residual m...
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bili...