Sciweavers

15 search results - page 2 / 3
» On the Worst-Case Analysis of Temporal-Difference Learning A...
Sort
View
ICML
2008
IEEE
14 years 8 months ago
A worst-case comparison between temporal difference and residual gradient with linear function approximation
Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...
Lihong Li
GECCO
2006
Springer
208views Optimization» more  GECCO 2006»
13 years 11 months ago
Comparing evolutionary and temporal difference methods in a reinforcement learning domain
Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical com...
Matthew E. Taylor, Shimon Whiteson, Peter Stone
ICML
2010
IEEE
13 years 8 months ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Huizhen Yu
EOR
2007
99views more  EOR 2007»
13 years 7 months ago
Learning lexicographic orders
The purpose of this paper is to learn the order of criteria of lexicographic decision under various reasonable assumptions. We give a sample evaluation and an oracle based algorit...
József Dombi, Csanád Imreh, Ná...
NIPS
2004
13 years 8 months ago
Online Bounds for Bayesian Algorithms
We present a competitive analysis of Bayesian learning algorithms in the online learning setting and show that many simple Bayesian algorithms (such as Gaussian linear regression ...
Sham M. Kakade, Andrew Y. Ng