Sciweavers

226 search results - page 39 / 46
» A Convergent Reinforcement Learning Algorithm in the Continu...
Sort
View
UAI
2008
13 years 8 months ago
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available afte...
Richard S. Sutton, Csaba Szepesvári, Alborz...
ATAL
2008
Springer
13 years 9 months ago
Artificial agents learning human fairness
Recent advances in technology allow multi-agent systems to be deployed in cooperation with or as a service for humans. Typically, those systems are designed assuming individually ...
Steven de Jong, Karl Tuyls, Katja Verbeeck
ICML
2005
IEEE
14 years 8 months ago
Combining model-based and instance-based learning for first order regression
T ORDER REGRESSION (EXTENDED ABSTRACT) Kurt Driessensa Saso Dzeroskib a Department of Computer Science, University of Waikato, Hamilton, New Zealand (kurtd@waikato.ac.nz) b Departm...
Kurt Driessens, Saso Dzeroski
ECML
2007
Springer
14 years 1 months ago
Policy Gradient Critics
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Daan Wierstra, Jürgen Schmidhuber
NIPS
2007
13 years 8 months ago
Receding Horizon Differential Dynamic Programming
The control of high-dimensional, continuous, non-linear dynamical systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques...
Yuval Tassa, Tom Erez, William D. Smart