Sciweavers

109 search results - page 6 / 22
» Policy teaching through reward function learning
Sort
View
ICML
2000
IEEE
14 years 8 months ago
Eligibility Traces for Off-Policy Policy Evaluation
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference meth...
Doina Precup, Richard S. Sutton, Satinder P. Singh
COLT
2007
Springer
14 years 1 months ago
Bounded Parameter Markov Decision Processes with Average Reward Criterion
Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, t...
Ambuj Tewari, Peter L. Bartlett
CDC
2008
IEEE
104views Control Systems» more  CDC 2008»
14 years 2 months ago
A structured multiarmed bandit problem and the greedy policy
—We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a s...
Adam J. Mersereau, Paat Rusmevichientong, John N. ...
ICML
1997
IEEE
14 years 8 months ago
Robot Learning From Demonstration
The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration th...
Christopher G. Atkeson, Stefan Schaal
UAI
2008
13 years 9 months ago
Improving Gradient Estimation by Incorporating Sensor Data
An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas m...
Gregory Lawrence, Stuart J. Russell