Sciweavers

263 search results - page 14 / 53
» Regret Bounds for Prediction Problems
Sort
View
COLT
2006
Springer
14 years 1 months ago
Online Learning with Constraints
In this paper, we study a sequential decision making problem. The objective is to maximize the total reward while satisfying constraints, which are defined at every time step. The...
Shie Mannor, John N. Tsitsiklis
NIPS
2004
13 years 11 months ago
Experts in a Markov Decision Process
We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Simi...
Eyal Even-Dar, Sham M. Kakade, Yishay Mansour
UAI
2004
13 years 11 months ago
Heuristic Search Value Iteration for POMDPs
We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret w...
Trey Smith, Reid G. Simmons
CORR
2006
Springer
83views Education» more  CORR 2006»
13 years 9 months ago
How to Beat the Adaptive Multi-Armed Bandit
The multi-armed bandit is a concise model for the problem of iterated decision-making under uncertainty. In each round, a gambler must pull one of K arms of a slot machine, withou...
Varsha Dani, Thomas P. Hayes
ICML
2001
IEEE
14 years 10 months ago
General Loss Bounds for Universal Sequence Prediction
The Bayesian framework is ideally suited for induction problems. The probability of observing xt at
Marcus Hutter