Sciweavers

ALT
2008
Springer

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

14 years 8 months ago
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the online regret (with respect to an (ε-)optimal policy) that are logarithmic in the number of steps taken. These bounds also match known asymptotic bounds for the general MDP setting. We also present corresponding lower bounds. As an application, multi-armed bandits with switching cost are considered.
Ronald Ortner
Added 14 Mar 2010
Updated 14 Mar 2010
Type Conference
Year 2008
Where ALT
Authors Ronald Ortner
Comments (0)