Sciweavers

ML
2002
ACM
133views Machine Learning» more  ML 2002»
14 years 17 days ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...