The Nonstochastic Multiarmed Bandit Problem

14 years 2 months ago

Download homes.dsi.unimi.it

Abstract. In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of T plays, we prove that the per-round payoff of our algorithm approaches that of the best arm at the rate O(T-1/2). We show by a matching lower bound that this is th...

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freun

Real-time Traffic

Bandit Problem | Per-round Payoff | SIAMCOMP 2002 | Slot Machines |

claim paper

Post Info
More Details (n/a)

Added	23 Dec 2010
Updated	23 Dec 2010
Type	Journal
Year	2002
Where	SIAMCOMP
Authors	Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire

Comments (0)

Sciweavers

The Nonstochastic Multiarmed Bandit Problem

Bandit Problem | Per-round Payoff | SIAMCOMP 2002 | Slot Machines |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers