Sciweavers

77 search results - page 14 / 16
» Value Function Approximation in Reinforcement Learning Using...
Sort
View
ICML
1999
IEEE
14 years 8 months ago
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...
Justin A. Boyan

Publication
222views
14 years 4 months ago
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...
Christos Dimitrakakis, Michail G. Lagoudakis
JMLR
2010
119views more  JMLR 2010»
13 years 2 months ago
A Convergent Online Single Time Scale Actor Critic Algorithm
Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their gen...
Dotan Di Castro, Ron Meir
ICML
2003
IEEE
14 years 8 months ago
TD(0) Converges Provably Faster than the Residual Gradient Algorithm
In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the ...
Ralf Schoknecht, Artur Merke
STOC
2012
ACM
209views Algorithms» more  STOC 2012»
11 years 10 months ago
Nearly optimal solutions for the chow parameters problem and low-weight approximation of halfspaces
The Chow parameters of a Boolean function f : {−1, 1}n → {−1, 1} are its n + 1 degree-0 and degree-1 Fourier coefficients. It has been known since 1961 [Cho61, Tan61] that ...
Anindya De, Ilias Diakonikolas, Vitaly Feldman, Ro...