We give the first rigorous upper bounds on the error of temporal difference (td) algorithms for policy evaluation as a function of the amount of experience. These upper bounds pr...
We combine the results of [13] and [8] and derive a continuous variant of a large class of drifting games. Our analysis furthers the understanding of the relationship between boos...
Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefine...
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalizatio...
We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process ( ¢¡¤£¦¥§ ), and focus on gradient ascent approache...
We investigate the computational complexity of the task of detecting dense regions of an unknown distribution from un-labeled samples of this distribution. We introduce a formal l...