Sciweavers

226 search results - page 34 / 46
» A Convergent Reinforcement Learning Algorithm in the Continu...
Sort
View
IBERAMIA
2010
Springer
13 years 6 months ago
Dynamic Reward Shaping: Training a Robot by Voice
Reinforcement Learning is commonly used for learning tasks in robotics, however, traditional algorithms can take very long training times. Reward shaping has been recently used to ...
Ana C. Tenorio-Gonzalez, Eduardo F. Morales, Luis ...

Publication
233views
12 years 6 months ago
Sparse reward processes
We introduce a class of learning problems where the agent is presented with a series of tasks. Intuitively, if there is relation among those tasks, then the information gained duri...
Christos Dimitrakakis
JMLR
2012
11 years 9 months ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...
ICML
2010
IEEE
13 years 8 months ago
Finite-Sample Analysis of LSTD
In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-differe...
Alessandro Lazaric, Mohammad Ghavamzadeh, Ré...
NIPS
1998
13 years 8 months ago
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning aft...
Michael J. Kearns, Satinder P. Singh