This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of represen...
Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun
— In this paper, we use the Markov Decision Process (MDP) technique to find the optimal code allocation policy in High-Speed Downlink Packet Access (HSDPA) networks. A discrete ...
Hussein Al-Zubaidy, Jerome Talim, Ioannis Lambadar...
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizo...
— This paper presents a new approximate policy iteration algorithm based on support vector regression (SVR). It provides an overview of commonly used cost approximation architect...
Abstract— In this paper, we consider a class of continuoustime, continuous-space stochastic optimal control problems. Building upon recent advances in Markov chain approximation ...