Search Sciweavers | Sciweavers

115 search results - page 14 / 23

» Recurrent policy gradients

144

click to vote

ICML
2008
IEEE

165views Machine Learning» more ICML 2008»

A worst-case comparison between temporal difference and residual gradient with linear function approximation

16 years 6 months ago

Download www.research.rutgers.edu

Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...

Lihong Li

claim paper

Read More »

166

click to vote

AAAI
2011

144views Intelligent Agents» more AAAI 2011»

Differential Eligibility Vectors for Advantage Updating and Gradient Methods

14 years 5 months ago

Download gaips.inesc-id.pt

In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of...

Francisco S. Melo

claim paper

Read More »

193

click to vote

AGI
2011

219views Artificial Intelligence» more AGI 2011»

Nonlinear-Dynamical Attention Allocation via Information Geometry

14 years 9 months ago

Download faculty.adams.edu

Inspired by a broader perspective viewing intelligent system dynamics in terms of the geometry of “cognitive spaces,” we conduct a preliminary investigation of the application ...

Matthew Iklé, Ben Goertzel

claim paper

Read More »

163

click to vote

ICML
2007
IEEE

180views Machine Learning» more ICML 2007»

Bayesian actor-critic algorithms

16 years 6 months ago

Download www.machinelearning.org

We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model ...

Mohammad Ghavamzadeh, Yaakov Engel

claim paper

Read More »

133

click to vote

NIPS
2003

128views Information Technology» more NIPS 2003»

Distributed Optimization in Adaptive Networks

15 years 7 months ago

Download books.nips.cc

We develop a protocol for optimizing dynamic behavior of a network of simple electronic components, such as a sensor network, an ad hoc network of mobile devices, or a network of ...

Ciamac Cyrus Moallemi, Benjamin Van Roy

claim paper

Read More »

« Prev « First page 14 / 23 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers