Sciweavers

181 search results - page 33 / 37
» On Policy Learning in Restricted Policy Spaces
Sort
View
CORR
2011
Springer
210views Education» more  CORR 2011»
13 years 2 months ago
Online Learning of Rested and Restless Bandits
In this paper we study the online learning problem involving rested and restless multiarmed bandits with multiple plays. The system consists of a single player/user and a set of K...
Cem Tekin, Mingyan Liu
FGR
2006
IEEE
121views Biometrics» more  FGR 2006»
14 years 1 months ago
Learning to Identify Facial Expression During Detection Using Markov Decision Process
While there has been a great deal of research in face detection and recognition, there has been very limited work on identifying the expression on a face. Many current face detect...
Ramana Isukapalli, Ahmed M. Elgammal, Russell Grei...
NIPS
1996
13 years 9 months ago
Multidimensional Triangulation and Interpolation for Reinforcement Learning
Dynamic Programming, Q-learning and other discrete Markov Decision Process solvers can be applied to continuous d-dimensional state-spaces by quantizing the state space into an arr...
Scott Davies
SDM
2007
SIAM
167views Data Mining» more  SDM 2007»
13 years 9 months ago
Bandits for Taxonomies: A Model-based Approach
We consider a novel problem of learning an optimal matching, in an online fashion, between two feature spaces that are organized as taxonomies. We formulate this as a multi-armed ...
Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabar...
UAI
2008
13 years 9 months ago
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available afte...
Richard S. Sutton, Csaba Szepesvári, Alborz...