Sciweavers

16 search results - page 2 / 4
» On Average Versus Discounted Reward Temporal-Difference Lear...
Sort
View
COLT
2007
Springer
14 years 1 months ago
Bounded Parameter Markov Decision Processes with Average Reward Criterion
Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, t...
Ambuj Tewari, Peter L. Bartlett
ALT
2007
Springer
14 years 4 months ago
Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are we...
Ronald Ortner
ICRA
2002
IEEE
133views Robotics» more  ICRA 2002»
14 years 17 days ago
The Necessity of Average Rewards in Cooperative Multirobot Learning
Learning can be an effective way for robot systems to deal with dynamic environments and changing task conditions. However, popular singlerobot learning algorithms based on discou...
Poj Tangamchit, John M. Dolan, Pradeep K. Khosla
AI
1998
Springer
13 years 7 months ago
Model-Based Average Reward Reinforcement Learning
Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discoun...
Prasad Tadepalli, DoKyeong Ok
ICML
2002
IEEE
14 years 8 months ago
Hierarchically Optimal Average Reward Reinforcement Learning
Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space defined by ...
Mohammad Ghavamzadeh, Sridhar Mahadevan