Search Sciweavers | Sciweavers

38 search results - page 5 / 8

» On the Convergence of Optimistic Policy Iteration

click to vote

CORR
2006
Springer

113views Education» more CORR 2006»

A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

13 years 7 months ago

Download hal.inria.fr

This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD()...

Manuel Loth, Philippe Preux

claim paper

Read More »

click to vote

CDC
2010
IEEE

136views Control Systems» more CDC 2010»

Pathologies of temporal difference methods in approximate dynamic programming

13 years 2 months ago

Download web.mit.edu

Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...

Dimitri P. Bertsekas

claim paper

Read More »

click to vote

ICONIP
2009

107views Information Technology» more ICONIP 2009»

Tracking in Reinforcement Learning

13 years 5 months ago

Download www.metz.supelec.fr

Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the...

Matthieu Geist, Olivier Pietquin, Gabriel Fricout

claim paper

Read More »

click to vote

CORR
2011
Springer

175views Education» more CORR 2011»

Adaptive Channel Recommendation for Dynamic Spectrum Access

13 years 2 months ago

Download home.ie.cuhk.edu.hk

—We propose a dynamic spectrum access scheme where secondary users recommend “good” channels to each other and access accordingly. We formulate the problem as an average rewa...

Xu Chen, Jianwei Huang, Husheng Li

claim paper

Read More »

click to vote

NIPS
2007

164views Information Technology» more NIPS 2007»

Incremental Natural Actor-Critic Algorithms

13 years 8 months ago

Download books.nips.cc

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...

Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...

claim paper

Read More »

« Prev « First page 5 / 8 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers