Sciweavers

827 search results - page 86 / 166
» Variational methods for Reinforcement Learning
Sort
View
ICML
2008
IEEE
14 years 8 months ago
Sample-based learning and search with permanent and transient memories
We present a reinforcement learning architecture, Dyna-2, that encompasses both samplebased learning and sample-based search, and that generalises across states during both learni...
David Silver, Martin Müller 0003, Richard S. ...
ECML
2007
Springer
14 years 2 months ago
Policy Gradient Critics
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Daan Wierstra, Jürgen Schmidhuber
IROS
2007
IEEE
164views Robotics» more  IROS 2007»
14 years 2 months ago
Emulation and behavior understanding through shared values
— Neurophysiology has revealed the existence of mirror neurons in brain of macaque monkeys and they shows similar activities during executing an observation of goal directed move...
Yasutake Takahashi, Teruyasu Kawamata, Minoru Asad...
NIPS
2001
13 years 9 months ago
Model-Free Least-Squares Policy Iteration
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. ...
Michail G. Lagoudakis, Ronald Parr
CVPR
2007
IEEE
14 years 9 months ago
Discriminant Additive Tangent Spaces for Object Recognition
Pattern variation is a major factor that affects the performance of recognition systems. In this paper, a novel manifold tangent modeling method called Discriminant Additive Tange...
Liang Xiong, Jianguo Li, Changshui Zhang