Multi-armed Bandit Algorithms and Empirical Evaluation

16 years 4 days ago

Download www.cs.nyu.edu

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multiarmed bandit algorithms. It also describes and analyzes a new algorithm, Poker (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the -greedy strategy, proves to be often hard to beat.

Joannès Vermorel, Mehryar Mohri

Real-time Traffic

ECML 2005 | Many Real-world Learning | Multi-armed Bandit Problem | Multiarmed Bandit Algorithms |

claim paper

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	ECML
Authors	Joannès Vermorel, Mehryar Mohri

Sciweavers

Multi-armed Bandit Algorithms and Empirical Evaluation

ECML 2005 | Many Real-world Learning | Multi-armed Bandit Problem | Multiarmed Bandit Algorithms |

Explore & Download

Productivity Tools

Sciweavers