Sciweavers

83 search results - page 9 / 17
» Online Learning: Beyond Regret
Sort
View
CORR
2010
Springer
127views Education» more  CORR 2010»
13 years 7 months ago
Online Algorithms for the Multi-Armed Bandit Problem with Markovian Rewards
We consider the classical multi-armed bandit problem with Markovian rewards. When played an arm changes its state in a Markovian fashion while it remains frozen when not played. Th...
Cem Tekin, Mingyan Liu
COLT
2006
Springer
13 years 11 months ago
Online Learning with Constraints
In this paper, we study a sequential decision making problem. The objective is to maximize the total reward while satisfying constraints, which are defined at every time step. The...
Shie Mannor, John N. Tsitsiklis
AAAI
2008
13 years 10 months ago
Online Learning with Expert Advice and Finite-Horizon Constraints
In this paper, we study a sequential decision making problem. The objective is to maximize the average reward accumulated over time subject to temporal cost constraints. The novel...
Branislav Kveton, Jia Yuan Yu, Georgios Theocharou...
ICML
2009
IEEE
14 years 8 months ago
Online feature elicitation in interactive optimization
Most models of utility elicitation in decision support and interactive optimization assume a predefined set of "catalog" features over which user preferences are express...
Craig Boutilier, Kevin Regan, Paolo Viappiani
CORR
2010
Springer
91views Education» more  CORR 2010»
13 years 2 months ago
Switching between Hidden Markov Models using Fixed Share
In prediction with expert advice the goal is to design online prediction algorithms that achieve small regret (additional loss on the whole data) compared to a reference scheme. I...
Wouter M. Koolen, Tim van Erven