Sciweavers

CSL
2012
Springer

Reinforcement learning for parameter estimation in statistical spoken dialogue systems

12 years 7 months ago
Reinforcement learning for parameter estimation in statistical spoken dialogue systems
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system’s responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optim...
Filip Jurcícek, Blaise Thomson, Steve Young
Added 21 Apr 2012
Updated 21 Apr 2012
Type Journal
Year 2012
Where CSL
Authors Filip Jurcícek, Blaise Thomson, Steve Young
Comments (0)