Search Sciweavers | Sciweavers

65 search results - page 4 / 13

» Gradient Descent for General Reinforcement Learning

256

click to vote

JMLR
2010

148views more JMLR 2010»

A Generalized Path Integral Control Approach to Reinforcement Learning

15 years 1 months ago

Download jmlr.csail.mit.edu

With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical tec...

Evangelos Theodorou, Jonas Buchli, Stefan Schaal

claim paper

Read More »

210

click to vote

ICIP
2008
IEEE

261views Image Processing» more ICIP 2008»

Learning distance metric for semi-supervised image segmentation

16 years 8 months ago

Download www.au.tsinghua.edu.cn

Semi-supervised image segmentation is an important issue in many image processing applications, and has been a popular research area recently, the most popular are graph-based met...

Yangqing Jia, Changshui Zhang

claim paper

Read More »

169

click to vote

ICML
2009
IEEE

131views Machine Learning» more ICML 2009»

Monte-Carlo simulation balancing

16 years 7 months ago

Download www.cs.ualberta.ca

In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation polic...

David Silver, Gerald Tesauro

claim paper

Read More »

180

click to vote

ICML
2008
IEEE

169views Machine Learning» more ICML 2008»

Large scale manifold transduction

16 years 7 months ago

Download ronan.collobert.com

We show how the regularizer of Transductive Support Vector Machines (TSVM) can be trained by stochastic gradient descent for linear models and multi-layer architectures. The resul...

Michael Karlen, Jason Weston, Ayse Erkan, Ronan Co...

claim paper

Read More »

155

click to vote

ICML
2003
IEEE

146views Machine Learning» more ICML 2003»

TD(0) Converges Provably Faster than the Residual Gradient Algorithm

16 years 7 months ago

Download www.hpl.hp.com

In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the ...

Ralf Schoknecht, Artur Merke

claim paper

Read More »

« Prev « First page 4 / 13 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers