Sciweavers

UAI
2001
14 years 1 months ago
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning
There exist a number of reinforcement learning algorithms which learn by climbing the gradient of expected reward. Their long-run convergence has been proved, even in partially ob...
Lex Weaver, Nigel Tao
UAI
2001
14 years 1 months ago
Maximum Likelihood Bounded Tree-Width Markov Networks
We study the problem of projecting a distribution onto (or finding a maximum likelihood distribution among) Markov networks of bounded tree-width. By casting it as the combinatori...
Nathan Srebro
UAI
2001
14 years 1 months ago
Policy Improvement for POMDPs Using Normalized Importance Sampling
We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowledge of the POMDP, can estimate the returns for finit...
Christian R. Shelton
UAI
2001
14 years 1 months ago
Vector-space Analysis of Belief-state Approximation for POMDPs
We propose a new approach to value-directed belief state approximationfor POMDPs. The valuedirected model allows one to choose approximation methods for belief state monitoringtha...
Pascal Poupart, Craig Boutilier
UAI
2001
14 years 1 months ago
Toward General Analysis of Recursive Probability Models
There is increasing interest within the research community in the design and use of recursive probability models. There remains concern about computational complexity costs and th...
Daniel Pless, George F. Luger
UAI
2001
14 years 1 months ago
Expectation Propagation for approximate Bayesian inference
This paper presents a new deterministic approximation technique in Bayesian networks. This method, "Expectation Propagation," unifies two previous techniques: assumed-de...
Thomas P. Minka
UAI
2001
14 years 1 months ago
Lattice Particle Filters
Dirk Ormoneit, Christiane Lemieux, David J. Fleet