As spoken language interfaces for real-world systems become a practical possibility, it has become apparent that such interfaces will need to draw on a variety of cues from diverse sources to achieve a robustness and naturalness approaching that of human performance [1]. However, our knowledge of how these cues behave in the aggregate is still tantalizingly sketchy. We lack a strong theoretical basis for predicting which cues will prove useful in practice and for specifying how these cues should be combined to signal or cancel out potential interpretations of the communicative signal. In the research program summarized here, we propose to develop and test an initial theory of cue integration for spoken language interfaces. By establishing a principled basis for integrating knowledge sources for such interfaces, we believe that we can develop systems that perform better from a computer-human interaction standpoint.
Karen Ward, David G. Novick