Arbitrarily modulated Markov decision processes

14 years 5 months ago

Download www.cim.mcgill.ca

— We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., nonstationary) fashion. We propose an online Q-learning style algorithm and give a guarantee on its performance evaluated in retrospect against alternative policies. Unlike previous works, the guarantee depends critically on the variability of the uncertainty in the transition probabilities, but holds regardless of arbitrary changes in rewards and transition probabilities over time. Besides its intrinsic computational efﬁciency, this approach requires neither prior knowledge nor estimation of the transition probabilities.

Jia Yuan Yu, Shie Mannor

Real-time Traffic

CDC 2009 | Control Systems | Markov Decision | Q-learning Style Algorithm | Transition Probabilities |

claim paper

Post Info
More Details (n/a)

Added	21 Jul 2010
Updated	21 Jul 2010
Type	Conference
Year	2009
Where	CDC
Authors	Jia Yuan Yu, Shie Mannor

Comments (0)

Sciweavers

Arbitrarily modulated Markov decision processes

CDC 2009 | Control Systems | Markov Decision | Q-learning Style Algorithm | Transition Probabilities |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers