Search Sciweavers | Sciweavers

115 search results - page 18 / 23

» Recurrent policy gradients

163

click to vote

ICML
2001
IEEE

185views Machine Learning» more ICML 2001»

Off-Policy Temporal Difference Learning with Function Approximation

16 years 6 months ago

Download www.cs.ualberta.ca

We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...

Doina Precup, Richard S. Sutton, Sanjoy Dasgupta

claim paper

Read More »

151

click to vote

AAAI
2000

139views Intelligent Agents» more AAAI 2000»

Localizing Search in Reinforcement Learning

15 years 6 months ago

Download www.cs.colorado.edu

Reinforcement learning (RL) can be impractical for many high dimensional problems because of the computational cost of doing stochastic search in large state spaces. We propose a ...

Gregory Z. Grudic, Lyle H. Ungar

claim paper

Read More »

166

click to vote

ICRA
2010
IEEE

145views Robotics» more ICRA 2010»

Reinforcement learning of motor skills in high dimensions: A path integral approach

15 years 4 months ago

Download www-personal.acfr.usyd.edu.au

— Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far d...

Evangelos Theodorou, Jonas Buchli, Stefan Schaal

claim paper

Read More »

154

click to vote

TVLSI
2008

107views more TVLSI 2008»

Static and Dynamic Temperature-Aware Scheduling for Multiprocessor SoCs

15 years 5 months ago

Download www.bu.edu

Thermal hot spots and high temperature gradients degrade reliability and performance, and increase cooling costs and leakage power. In this paper, we explore the benefits of temper...

Ayse Kivilcim Coskun, T. T. Rosing, Keith Whisnant...

claim paper

Read More »

274

click to vote

TON
2010

151views more TON 2010»

Throughput Optimal Distributed Power Control of Stochastic Wireless Networks

15 years 5 days ago

Download pantheon.yale.edu

The Maximum Differential Backlog (MDB) control policy of Tassiulas and Ephremides has been shown to adaptively maximize the stable throughput of multihop wireless networks with ran...

Yufang Xi, Edmund M. Yeh

claim paper

Read More »

« Prev « First page 18 / 23 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers