Sciweavers

CORR
2006
Springer
140views Education» more  CORR 2006»
13 years 11 months ago
Nearly optimal exploration-exploitation decision thresholds
While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds ...
Christos Dimitrakakis
NIPS
2004
14 years 25 days ago
Nearly Tight Bounds for the Continuum-Armed Bandit Problem
In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. Wh...
Robert D. Kleinberg
SDM
2007
SIAM
167views Data Mining» more  SDM 2007»
14 years 26 days ago
Bandits for Taxonomies: A Model-based Approach
We consider a novel problem of learning an optimal matching, in an online fashion, between two feature spaces that are organized as taxonomies. We formulate this as a multi-armed ...
Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabar...
COLT
2008
Springer
14 years 1 months ago
Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O ( T) regret. The setting is a natural general...
Jacob Abernethy, Elad Hazan, Alexander Rakhlin
COLT
2003
Springer
14 years 4 months ago
Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem
We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play th...
Shie Mannor, John N. Tsitsiklis
ECML
2005
Springer
14 years 5 months ago
Multi-armed Bandit Algorithms and Empirical Evaluation
The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and opt...
Joannès Vermorel, Mehryar Mohri