This paper focuses on the analysis and prediction of so-called aware sites, defined as turns where a user of a spoken dialogue system first becomes aware that the system has made a speech recognition error. We describe statistical comparisons of features of these aware sites in a train timetable spoken dialogue corpus, which reveal significant prosodic differences between such turns, compared with turns that `correct' speech recognition errors as well as with `normal' turns that are neither aware sites nor corrections. We then present machine learning results in which we show how prosodic features in combination with other automatically available features can predict whether or not a user turn was a normal turn, a correction, and/or an aware site.
Diane J. Litman, Julia Hirschberg, Marc Swerts