Sciweavers

22 search results - page 2 / 5
» High-Probability Regret Bounds for Bandit Online Linear Opti...
Sort
View
COLT
2010
Springer
13 years 5 months ago
Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback
Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially gen...
Alekh Agarwal, Ofer Dekel, Lin Xiao
CORR
2004
Springer
103views Education» more  CORR 2004»
13 years 7 months ago
Online convex optimization in the bandit setting: gradient descent without a gradient
We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2, . . . , and in each period, we choose a feasible po...
Abraham Flaxman, Adam Tauman Kalai, H. Brendan McM...
CORR
2011
Springer
210views Education» more  CORR 2011»
13 years 2 months ago
Online Learning of Rested and Restless Bandits
In this paper we study the online learning problem involving rested and restless multiarmed bandits with multiple plays. The system consists of a single player/user and a set of K...
Cem Tekin, Mingyan Liu
ALT
2008
Springer
14 years 4 months ago
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
Ronald Ortner
NIPS
2008
13 years 9 months ago
On the Generalization Ability of Online Strongly Convex Programming Algorithms
This paper examines the generalization properties of online convex programming algorithms when the loss function is Lipschitz and strongly convex. Our main result is a sharp bound...
Sham M. Kakade, Ambuj Tewari