average reward | Sciweavers

193

ML
2002
ACM

168views Machine Learning» more ML 2002»

On Average Versus Discounted Reward Temporal-Difference Learning

15 years 6 months ago

We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asympt...

John N. Tsitsiklis, Benjamin Van Roy

claim paper

Read More »

171

click to vote

IJCAI
2001

174views Artificial Intelligence» more IJCAI 2001»

Complexity of Probabilistic Planning under Average Rewards

15 years 8 months ago

Download www.informatik.uni-freiburg.de

A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large ...

Jussi Rintanen

claim paper

Read More »

186

click to vote

COLT
2003
Springer

141views Machine Learning» more COLT 2003»

On-Line Learning with Imperfect Monitoring

15 years 11 months ago

Download www.ece.mcgill.ca

We study on-line play of repeated matrix games in which the observations of past actions of the other player and the obtained reward are partial and stochastic. We deﬁne the Part...

Shie Mannor, Nahum Shimkin

claim paper

Read More »

188

click to vote

COLT
2007
Springer

143views Machine Learning» more COLT 2007»

Bounded Parameter Markov Decision Processes with Average Reward Criterion

16 years 22 days ago

Download ttic.uchicago.edu

Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, t...

Ambuj Tewari, Peter L. Bartlett

claim paper

Read More »

151

click to vote

ALT
2006
Springer

109views Machine Learning» more ALT 2006»

General Discounting Versus Average Reward

16 years 3 months ago

Download www.idsia.ch

Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to ...

Marcus Hutter

claim paper

Read More »

162

click to vote

ICML
2000
IEEE

126views Machine Learning» more ICML 2000»

Reinforcement Learning in POMDP's via Direct Gradient Ascent

16 years 7 months ago

Download reference.kfupm.edu.sa

This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...

Jonathan Baxter, Peter L. Bartlett

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers