Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

15 years 3 days ago

Download personal.unileoben.ac.at

Abstract. We consider an upper conﬁdence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the online regret (with respect to an (ε-)optimal policy) that are logarithmic in the number of steps taken. These bounds also match known asymptotic bounds for the general MDP setting. We also present corresponding lower bounds. As an application, multi-armed bandits with switching cost are considered.

Ronald Ortner

Real-time Traffic

ALT 2008 | Asymptotic Bounds | Lower Bounds | Machine Learning | Upper Bounds |

claim paper

Post Info
More Details (n/a)

Added	14 Mar 2010
Updated	14 Mar 2010
Type	Conference
Year	2008
Where	ALT
Authors	Ronald Ortner

Comments (0)

Sciweavers

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

ALT 2008 | Asymptotic Bounds | Lower Bounds | Machine Learning | Upper Bounds |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers