Sciweavers

ESANN
2006
14 years 1 months ago
Reducing policy degradation in neuro-dynamic programming
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in...
Thomas Gabel, Martin Riedmiller
ICRA
1995
IEEE
123views Robotics» more  ICRA 1995»
14 years 4 months ago
Vision-Based Reinforcement Learning for Purposive Behavior Acquisition
This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal, and discusses several issues in applying the reinforcement...
Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, ...
ICRA
2009
IEEE
143views Robotics» more  ICRA 2009»
14 years 7 months ago
Least absolute policy iteration for robust value function approximation
Abstract— Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers...
Masashi Sugiyama, Hirotaka Hachiya, Hisashi Kashim...