Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference meth...
Doina Precup, Richard S. Sutton, Satinder P. Singh
Abstract. We present first experiments using Support Vector Regression as function approximator for an on-line, sarsa-like reinforcement learner. To overcome the batch nature of S...
This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of...
Abstract. Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free m...
We describe an algorithm for learning in the presence of multiple criteria. Our technique generalizes previous approaches in that it can learn optimal policies for all linear pref...