Sciweavers

ICML
2010
IEEE

Efficient Selection of Multiple Bandit Arms: Theory and Practice

14 years 1 months ago
Efficient Selection of Multiple Bandit Arms: Theory and Practice
We consider the general, widely applicable problem of selecting from n real-valued random variables a subset of size m of those with the highest means, based on as few samples as possible. This problem, which we denote Explore-m, is a core aspect in several stochastic optimization algorithms, and applications of simulation and industrial engineering. The theoretical basis for our work is an extension of a previous formulation using multi-armed bandits that is devoted to identifying just the one best of n random variables (Explore1). In addition to providing PAC bounds for the general case, we tailor our theoretically grounded approach to work efficiently in practice. Empirical comparisons of the resulting sampling algorithm against stateof-the-art subset selection strategies demonstrate significant gains in sample efficiency.
Shivaram Kalyanakrishnan, Peter Stone
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where ICML
Authors Shivaram Kalyanakrishnan, Peter Stone
Comments (0)