Sciweavers

263 search results - page 17 / 53
» Regret Bounds for Prediction Problems
Sort
View
CORR
2007
Springer
106views Education» more  CORR 2007»
13 years 9 months ago
Bandit Algorithms for Tree Search
Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to ret...
Pierre-Arnaud Coquelin, Rémi Munos
JMLR
2008
137views more  JMLR 2008»
13 years 9 months ago
Online Learning of Complex Prediction Problems Using Simultaneous Projections
We describe and analyze an algorithmic framework for online classification where each online trial consists of multiple prediction tasks that are tied together. We tackle the prob...
Yonatan Amit, Shai Shalev-Shwartz, Yoram Singer
JMLR
2010
101views more  JMLR 2010»
13 years 4 months ago
Efficient Reductions for Imitation Learning
Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the trainin...
Stéphane Ross, Drew Bagnell
TSP
2010
13 years 4 months ago
Distributed learning in multi-armed bandit with multiple players
We formulate and study a decentralized multi-armed bandit (MAB) problem. There are distributed players competing for independent arms. Each arm, when played, offers i.i.d. reward a...
Keqin Liu, Qing Zhao
JMLR
2010
161views more  JMLR 2010»
13 years 4 months ago
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning...
Lin Xiao