Efficient Selection of Multiple Bandit Arms: Theory and Practice

14 years 1 months ago

Download www.cs.utexas.edu

We consider the general, widely applicable problem of selecting from n real-valued random variables a subset of size m of those with the highest means, based on as few samples as possible. This problem, which we denote Explore-m, is a core aspect in several stochastic optimization algorithms, and applications of simulation and industrial engineering. The theoretical basis for our work is an extension of a previous formulation using multi-armed bandits that is devoted to identifying just the one best of n random variables (Explore1). In addition to providing PAC bounds for the general case, we tailor our theoretically grounded approach to work efficiently in practice. Empirical comparisons of the resulting sampling algorithm against stateof-the-art subset selection strategies demonstrate significant gains in sample efficiency.

Shivaram Kalyanakrishnan, Peter Stone

Real-time Traffic

ICML 2010 | Machine Learning | Random Variables | Real-valued Random Variables | Stochastic Optimization Algorithms |

claim paper

Post Info
More Details (n/a)

Added	09 Nov 2010
Updated	09 Nov 2010
Type	Conference
Year	2010
Where	ICML
Authors	Shivaram Kalyanakrishnan, Peter Stone

Comments (0)

Sciweavers

Efficient Selection of Multiple Bandit Arms: Theory and Practice

ICML 2010 | Machine Learning | Random Variables | Real-valued Random Variables | Stochastic Optimization Algorithms |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers