Sciweavers

779 search results - page 81 / 156
» Reinforcement Using Supervised Learning for Policy Generaliz...
Sort
View
IROS
2008
IEEE
144views Robotics» more  IROS 2008»
14 years 3 months ago
Learning nonparametric policies by imitation
— A long cherished goal in artificial intelligence has been the ability to endow a robot with the capacity to learn and generalize skills from watching a human teacher. Such an ...
David B. Grimes, Rajesh P. N. Rao
JMLR
2006
124views more  JMLR 2006»
13 years 9 months ago
Policy Gradient in Continuous Time
Policy search is a method for approximately solving an optimal control problem by performing a parametric optimization search in a given class of parameterized policies. In order ...
Rémi Munos
NECO
2011
13 years 4 months ago
Least Squares Estimation Without Priors or Supervision
Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values), or a prior probability model for...
Martin Raphan, Eero P. Simoncelli
ICML
2008
IEEE
14 years 10 months ago
Exploration scavenging
We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evalua...
John Langford, Alexander L. Strehl, Jennifer Wortm...
KDD
2006
ACM
115views Data Mining» more  KDD 2006»
14 years 9 months ago
Supervised probabilistic principal component analysis
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When label...
Shipeng Yu, Kai Yu, Volker Tresp, Hans-Peter Krieg...