Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

16 years 3 months ago

Download personal.unileoben.ac.at

Abstract. We consider an upper conﬁdence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the online regret (with respect to an (ε-)optimal policy) that are logarithmic in the number of steps taken. These bounds also match known asymptotic bounds for the general MDP setting. We also present corresponding lower bounds. As an application, multi-armed bandits with switching cost are considered.

Ronald Ortner

Real-time Traffic

ALT 2008 | Asymptotic Bounds | Lower Bounds | Machine Learning | Upper Bounds |

claim paper

» Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

» Optimism in Reinforcement Learning Based on KullbackLeibler Divergence

» Online Learning in Opportunistic Spectrum Access A Restless Bandit Approach

» PEGASUS A policy search method for large MDPs and POMDPs

Post Info
More Details (n/a)

Added	14 Mar 2010
Updated	14 Mar 2010
Type	Conference
Year	2008
Where	ALT
Authors	Ronald Ortner

Comments (0)

Sciweavers

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

ALT 2008 | Asymptotic Bounds | Lower Bounds | Machine Learning | Upper Bounds |

Explore & Download

Productivity Tools

Sciweavers