Search Sciweavers | Sciweavers

109 search results - page 6 / 22

» Policy teaching through reward function learning

192

click to vote

ICML
2000
IEEE

153views Machine Learning» more ICML 2000»

Eligibility Traces for Off-Policy Policy Evaluation

16 years 7 months ago

Download www.cs.ualberta.ca

Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference meth...

Doina Precup, Richard S. Sutton, Satinder P. Singh

claim paper

Read More »

191

click to vote

COLT
2007
Springer

143views Machine Learning» more COLT 2007»

Bounded Parameter Markov Decision Processes with Average Reward Criterion

16 years 26 days ago

Download ttic.uchicago.edu

Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, t...

Ambuj Tewari, Peter L. Bartlett

claim paper

Read More »

164

click to vote

CDC
2008
IEEE

104views Control Systems» more CDC 2008»

A structured multiarmed bandit problem and the greedy policy

16 years 1 months ago

Download web.mit.edu

—We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a s...

Adam J. Mersereau, Paat Rusmevichientong, John N. ...

claim paper

Read More »

169

click to vote

ICML
1997
IEEE

181views Machine Learning» more ICML 1997»

Robot Learning From Demonstration

16 years 7 months ago

Download www-clmc.usc.edu

The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration th...

Christopher G. Atkeson, Stefan Schaal

claim paper

Read More »

185

click to vote

UAI
2008

234views Artificial Intelligence» more UAI 2008»

Improving Gradient Estimation by Incorporating Sensor Data

15 years 8 months ago

Download www.cs.berkeley.edu

An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas m...

Gregory Lawrence, Stuart J. Russell

claim paper

Read More »

« Prev « First page 6 / 22 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers