Sciweavers

162

Voted

ML
2002
ACM

133views Machine Learning» more ML 2002»

Finite-time Analysis of the Multiarmed Bandit Problem

15 years 6 months ago

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...

Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers