Sciweavers

6 search results - page 1 / 2
» Online Regret Bounds for Markov Decision Processes with Dete...
Sort
View
ALT
2008
Springer
14 years 4 months ago
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
Ronald Ortner
EWRL
2008
13 years 9 months ago
Markov Decision Processes with Arbitrary Reward Processes
Abstract. We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily ove...
Jia Yuan Yu, Shie Mannor, Nahum Shimkin
NIPS
2007
13 years 8 months ago
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...
Ambuj Tewari, Peter L. Bartlett
CORR
2010
Springer
105views Education» more  CORR 2010»
13 years 6 months ago
Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence
We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carryin...
Sarah Filippi, Olivier Cappé, Aurelien Gari...
CORR
2010
Springer
171views Education» more  CORR 2010»
13 years 2 months ago
Online Learning in Opportunistic Spectrum Access: A Restless Bandit Approach
We consider an opportunistic spectrum access (OSA) problem where the time-varying condition of each channel (e.g., as a result of random fading or certain primary users' activ...
Cem Tekin, Mingyan Liu