Search Sciweavers | Sciweavers

98 search results - page 16 / 20

» Using Rewards for Belief State Updates in Partially Observab...

click to vote

ICML
1999
IEEE

168views Machine Learning» more ICML 1999»

Least-Squares Temporal Difference Learning

14 years 8 months ago

Download www.research.rutgers.edu

Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...

Justin A. Boyan

claim paper

Read More »

click to vote

NIPS
2001

144views Information Technology» more NIPS 2001»

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

13 years 9 months ago

Download jmlr.csail.mit.edu

Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...

Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...

claim paper

Read More »

click to vote

ECML
2007
Springer

192views Machine Learning» more ECML 2007»

Policy Gradient Critics

14 years 1 months ago

Download www.idsia.ch

We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...

Daan Wierstra, Jürgen Schmidhuber

claim paper

Read More »

click to vote

ACL
2008

136views Computational Linguistics» more ACL 2008»

Mixture Model POMDPs for Efficient Handling of Uncertainty in Dialogue Management

13 years 9 months ago

Download www.classic-project.org

In spoken dialogue systems, Partially Observable Markov Decision Processes (POMDPs) provide a formal framework for making dialogue management decisions under uncertainty, but effi...

James Henderson, Oliver Lemon

claim paper

Read More »

click to vote

AIPS
2000

107views Artificial Intelligence» more AIPS 2000»

On-line Scheduling via Sampling

13 years 9 months ago

Download www.aaai.org

1 We consider the problem of scheduling an unknown sequence of tasks for a single server as the tasks arrive with the goal off maximizing the total weighted value of the tasks serv...

Hyeong Soo Chang, Robert Givan, Edwin K. P. Chong

claim paper

Read More »

« Prev « First page 16 / 20 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers