Search Sciweavers | Sciweavers

22 search results - page 4 / 5

» Tighter Value Function Bounds for Bayesian Reinforcement Lea...

139

click to vote

ICML
2001
IEEE

185views Machine Learning» more ICML 2001»

Off-Policy Temporal Difference Learning with Function Approximation

16 years 4 months ago

Download www.cs.ualberta.ca

We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...

Doina Precup, Richard S. Sutton, Sanjoy Dasgupta

claim paper

Read More »

132

click to vote

IJCAI
2007

201views Artificial Intelligence» more IJCAI 2007»

Using Linear Programming for Bayesian Exploration in Markov Decision Processes

15 years 4 months ago

Download www.cs.mcgill.ca

A key problem in reinforcement learning is ﬁnding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much ...

Pablo Samuel Castro, Doina Precup

claim paper

Read More »

145

click to vote

Publication

222views

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

16 years 7 days ago

Download arxiv.org

Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...

Christos Dimitrakakis, Michail G. Lagoudakis

posted by olethros

Read More »

114

Voted

ICML
2002
IEEE

113views Machine Learning» more ICML 2002»

Learning from Scarce Experience

16 years 4 months ago

Download www.cs.ucr.edu

Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each ...

Leonid Peshkin, Christian R. Shelton

claim paper

Read More »

119

click to vote

CORR
2000
Springer

92views Education» more CORR 2000»

Predicting the expected behavior of agents that learn about agents: the CLRI framework

15 years 3 months ago

Download jmvidal.cse.sc.edu

We describe a framework and equations used to model and predict the behavior of multi-agent systems (MASs) with learning agents. A difference equation is used for calculating the ...

José M. Vidal, Edmund H. Durfee

claim paper

Read More »

« Prev « First page 4 / 5 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers