Uncertainty propagation for quality assurance in Reinforcement Learning

14 years 7 months ago

Download www.inb.uni-luebeck.de

— In this paper we address the reliability of policies derived by Reinforcement Learning on a limited amount of observations. This can be done in a principled manner by taking into account the derived Q-function’s uncertainty, which stems from the uncertainty of the estimators used for the MDP’s transition probabilities and the reward function. We apply uncertainty propagation parallelly to the Bellman iteration and achieve conﬁdence intervals for the Q-function. In a second step we change the Bellman operator as to achieve a policy guaranteeing the highest minimum performance with a given probability. We demonstrate the functionality of our method on artiﬁcial examples and show that, for an important problem class even an enhancement of the expected performance can be obtained. Finally we verify this observation on an application to gas turbine control.

Daniel Schneegaß, Steffen Udluft, Thomas Mar

Real-time Traffic

Artificial Intelligence | IJCNN 2008 | MDP’s Transition Probabilities | Q-function’s Uncertainty | Uncertainty Propagation |

claim paper

Post Info
More Details (n/a)

Added	31 May 2010
Updated	31 May 2010
Type	Conference
Year	2008
Where	IJCNN
Authors	Daniel Schneegaß, Steffen Udluft, Thomas Martinetz

Comments (0)

Sciweavers

Uncertainty propagation for quality assurance in Reinforcement Learning

Artificial Intelligence | IJCNN 2008 | MDP’s Transition Probabilities | Q-function’s Uncertainty | Uncertainty Propagation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers