Search Sciweavers | Sciweavers

86 search results - page 4 / 18

» Estimation and Approximation Bounds for Gradient-Based Reinf...

232

click to vote

Publication

222views

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

16 years 4 months ago

Download arxiv.org

Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...

Christos Dimitrakakis, Michail G. Lagoudakis

posted by olethros

Read More »

162

click to vote

ICMLA
2007

92views Machine Learning» more ICMLA 2007»

Control of a re-entrant line manufacturing model with a reinforcement learning approach

15 years 8 months ago

Download www.smitlab.uc.edu

This paper presents the application of a reinforcement learning (RL) approach for the near-optimal control of a re-entrant line manufacturing (RLM) model. The RL approach utilizes...

José A. Ramírez-Hernández, Em...

claim paper

Read More »

188

click to vote

CORR
2010
Springer

105views Education» more CORR 2010»

Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence

15 years 5 months ago

Download hal.archives-ouvertes.fr

We consider model-based reinforcement learning in ﬁnite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carryin...

Sarah Filippi, Olivier Cappé, Aurelien Gari...

claim paper

Read More »

180

click to vote

ICML
2000
IEEE

126views Machine Learning» more ICML 2000»

Reinforcement Learning in POMDP's via Direct Gradient Ascent

16 years 7 months ago

Download reference.kfupm.edu.sa

This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...

Jonathan Baxter, Peter L. Bartlett

claim paper

Read More »

195

click to vote

ICML
2001
IEEE

185views Machine Learning» more ICML 2001»

Off-Policy Temporal Difference Learning with Function Approximation

16 years 7 months ago

Download www.cs.ualberta.ca

We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...

Doina Precup, Richard S. Sutton, Sanjoy Dasgupta

claim paper

Read More »

« Prev « First page 4 / 18 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers