We present a data-driven approach to learn user-adaptive referring expression generation (REG) policies for spoken dialogue systems. Referring expressions can be difficult to unde...
One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (...
Lihong Li, Michael L. Littman, Christopher R. Mans...
The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Temporal difference methods and Bellman residual m...
— In this paper, we propose a dynamic lightpath establishment method for service differentiation in all-optical WDM networks with the capability of full-range wavelength conversi...
We present a design for policy-based performance management of SMS Systems. The design takes as input the operator’s performance goals, which are expressed as policies that can b...