Bayesian Q-Learning

15 years 8 months ago

Download www.aaai.org

A central problem in learning in complex environmentsis balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information--the expected improvement in future decision quality that might arise from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper, we adopt a Bayesian approach to maintaining this uncertain information. We extend Watkins' Q-learning by maintaining and propagating probability distributions over the Q-values. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. We establish the convergence properties of our algorithm and show experimentally that it can exhibit substantial improve...

Richard Dearden, Nir Friedman, Stuart J. Russell

Real-time Traffic

AAAI 1998 | Environmentsis Balancing Exploration | Exploration | Intelligent Agents | Well-known Model-free Exploration |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	1998
Where	AAAI
Authors	Richard Dearden, Nir Friedman, Stuart J. Russell

Comments (0)

Sciweavers

Bayesian Q-Learning

AAAI 1998 | Environmentsis Balancing Exploration | Exploration | Intelligent Agents | Well-known Model-free Exploration |

Explore & Download

Productivity Tools

Sciweavers