We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of...
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...
Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to ret...
Prediction from expert advice is a fundamental problem in machine learning. A major pillar of the field is the existence of learning algorithms whose average loss approaches that ...
We consider an opportunistic spectrum access (OSA) problem where the time-varying condition of each channel (e.g., as a result of random fading or certain primary users' activ...