Sciweavers

AAAI
2015

The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits

8 years 9 months ago
The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits
Current algorithms for the standard multi-armed bandit problem have good empirical performance and optimal regret bounds. However, real-world problems often differ from the standard formulation in several ways. First, feedback may be delayed instead of arriving immediately. Second, the real world often contains structure which suggests heuristics, which we wish to incorporate while retaining strong theoretical guarantees. Third, we may wish to make use of an arbitrary prior dataset without negatively impacting performance. Fourth, we may wish to efficiently evaluate algorithms using a previously collected dataset. Surprisingly, these seemingly-disparate problems can be addressed using algorithms inspired by a recently-developed queueing technique. We present the Stochastic Delayed Bandits (SDB) algorithm as a solution to these four problems, which takes black-box bandit algorithms (including heuristic approaches) as input while achieving good theoretical guarantees. We present empiri...
Travis Mandel, Yun-En Liu, Emma Brunskill, Zoran P
Added 27 Mar 2016
Updated 27 Mar 2016
Type Journal
Year 2015
Where AAAI
Authors Travis Mandel, Yun-En Liu, Emma Brunskill, Zoran Popovic
Comments (0)