Sciweavers

115 search results - page 18 / 23
» Recurrent policy gradients
Sort
View
ICML
2001
IEEE
14 years 8 months ago
Off-Policy Temporal Difference Learning with Function Approximation
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta
AAAI
2000
13 years 9 months ago
Localizing Search in Reinforcement Learning
Reinforcement learning (RL) can be impractical for many high dimensional problems because of the computational cost of doing stochastic search in large state spaces. We propose a ...
Gregory Z. Grudic, Lyle H. Ungar
ICRA
2010
IEEE
145views Robotics» more  ICRA 2010»
13 years 6 months ago
Reinforcement learning of motor skills in high dimensions: A path integral approach
— Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far d...
Evangelos Theodorou, Jonas Buchli, Stefan Schaal
TVLSI
2008
107views more  TVLSI 2008»
13 years 7 months ago
Static and Dynamic Temperature-Aware Scheduling for Multiprocessor SoCs
Thermal hot spots and high temperature gradients degrade reliability and performance, and increase cooling costs and leakage power. In this paper, we explore the benefits of temper...
Ayse Kivilcim Coskun, T. T. Rosing, Keith Whisnant...
TON
2010
151views more  TON 2010»
13 years 2 months ago
Throughput Optimal Distributed Power Control of Stochastic Wireless Networks
The Maximum Differential Backlog (MDB) control policy of Tassiulas and Ephremides has been shown to adaptively maximize the stable throughput of multihop wireless networks with ran...
Yufang Xi, Edmund M. Yeh