Sciweavers

ML
2002
ACM
133views Machine Learning» more  ML 2002»
13 years 11 months ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...