Sciweavers

263 search results - page 9 / 53
» Regret Bounds for Prediction Problems
Sort
View
ICASSP
2011
IEEE
13 years 1 months ago
Logarithmic weak regret of non-Bayesian restless multi-armed bandit
Abstract—We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. At each time, a player chooses K out of N (N > K) arms to play. The state of each ar...
Haoyang Liu, Keqin Liu, Qing Zhao
COLT
2007
Springer
14 years 4 months ago
Learning Permutations with Exponential Weights
We give an algorithm for the on-line learning of permutations. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic weight matrix, and makes...
David P. Helmbold, Manfred K. Warmuth
CORR
2011
Springer
210views Education» more  CORR 2011»
13 years 4 months ago
Online Learning of Rested and Restless Bandits
In this paper we study the online learning problem involving rested and restless multiarmed bandits with multiple plays. The system consists of a single player/user and a set of K...
Cem Tekin, Mingyan Liu
LION
2010
Springer
190views Optimization» more  LION 2010»
14 years 1 months ago
Algorithm Selection as a Bandit Problem with Unbounded Losses
Abstract. Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In r...
Matteo Gagliolo, Jürgen Schmidhuber
ICML
2009
IEEE
14 years 10 months ago
A simpler unified analysis of budget perceptrons
The kernel Perceptron is an appealing online learning algorithm that has a drawback: whenever it makes an error it must increase its support set, which slows training and testing ...
Ilya Sutskever