Modeling of individual users is a promising way of improving the performance of spoken dialogue systems deployed for the general public and utilized repeatedly. We define "implicitly-supervised" ASR accuracy per user on the basis of responses following the system's explicit confirmations. We combine the estimated ASR accuracy with the user's barge-in rate, which represents how well the user is accustomed to using the system, to predict interpretation errors in barge-in utterances. Experimental results showed that the estimated ASR accuracy improved prediction performance. Since this ASR accuracy and the barge-in rate are obtainable at runtime, they improve prediction performance without the need for manual labeling.
Kazunori Komatani, Alexander I. Rudnicky