Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship l...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
—Semi-supervised learning concerns the problem of learning in the presence of labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learni...
To avoid the curse of dimensionality, function approximators are used in reinforcement learning to learn value functions for individual states. In order to make better use of comp...
This paper introduces a new algorithm, Q2, foroptimizingthe expected output ofamultiinput noisy continuous function. Q2 is designed to need only a few experiments, it avoids stron...
Andrew W. Moore, Jeff G. Schneider, Justin A. Boya...