Sciweavers

ICML
2008
IEEE
15 years 14 days ago
Strategy evaluation in extensive games with importance sampling
Typically agent evaluation is done through Monte Carlo estimation. However, stochastic agent decisions and stochastic outcomes can make this approach inefficient, requiring many s...
Michael H. Bowling, Michael Johanson, Neil Burch, ...
ICML
2008
IEEE
15 years 14 days ago
Automatic discovery and transfer of MAXQ hierarchies
We present an algorithm, HI-MAT (Hierarchy Induction via Models And Trajectories), that discovers MAXQ task hierarchies by applying dynamic Bayesian network models to a successful...
Neville Mehta, Soumya Ray, Prasad Tadepalli, Thoma...
ICML
2008
IEEE
15 years 14 days ago
Tailoring density estimation via reproducing kernel moment matching
Moment matching is a popular means of parametric density estimation. We extend this technique to nonparametric estimation of mixture models. Our approach works by embedding distri...
Alex J. Smola, Arthur Gretton, Bernhard Schöl...
ICML
2008
IEEE
15 years 14 days ago
Beam sampling for the infinite hidden Markov model
The infinite hidden Markov model is a nonparametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov...
Jurgen Van Gael, Yunus Saatci, Yee Whye Teh, Zoubi...
ICML
2008
IEEE
15 years 14 days ago
Learning all optimal policies with multiple criteria
We describe an algorithm for learning in the presence of multiple criteria. Our technique generalizes previous approaches in that it can learn optimal policies for all linear pref...
Leon Barrett, Srini Narayanan
ICML
2008
IEEE
15 years 14 days ago
A worst-case comparison between temporal difference and residual gradient with linear function approximation
Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...
Lihong Li