Search Sciweavers | Sciweavers

69 search results - page 2 / 14

» PAC-Bayesian Policy Evaluation for Reinforcement Learning

169

click to vote

JAIR
2002

99views more JAIR 2002»

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System

15 years 5 months ago

Download www.eecs.umich.edu

Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a di...

Satinder P. Singh, Diane J. Litman, Michael J. Kea...

claim paper

Read More »

190

click to vote

GECCO
2009
Springer

162views Optimization» more GECCO 2009»

Uncertainty handling CMA-ES for reinforcement learning

15 years 3 months ago

Download www.neuroinformatik.ruhr-uni-bochum.de

The covariance matrix adaptation evolution strategy (CMAES) has proven to be a powerful method for reinforcement learning (RL). Recently, the CMA-ES has been augmented with an ada...

Verena Heidrich-Meisner, Christian Igel

claim paper

Read More »

130

click to vote

ATAL
2009
Springer

198views Intelligent Agents» more ATAL 2009»

SarsaLandmark: an algorithm for learning in POMDPs with landmarks

16 years 8 days ago

Download www.aamas-conference.org

Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in pa...

Michael R. James, Satinder P. Singh

claim paper

Read More »

189

click to vote

SIGDIAL
2010

186views Natural Language Processing» more SIGDIAL 2010»

Adaptive Referring Expression Generation in Spoken Dialogue Systems: Evaluation with Real Users

15 years 3 months ago

Download www.sigdial.org

We present new results from a real-user evaluation of a data-driven approach to learning user-adaptive referring expression generation (REG) policies for spoken dialogue systems. ...

Srinivasan Janarthanam, Oliver Lemon

claim paper

Read More »

166

click to vote

ICML
2001
IEEE

185views Machine Learning» more ICML 2001»

Off-Policy Temporal Difference Learning with Function Approximation

16 years 6 months ago

Download www.cs.ualberta.ca

We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...

Doina Precup, Richard S. Sutton, Sanjoy Dasgupta

claim paper

Read More »

« Prev « First page 2 / 14 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers