Systems that attempt to understand natural human input make mistakes, even humans. However, humans avoid misunderstandings by confirming doubtful input. Multimodal systems--those that combine simultaneous input from more than one modality, for example speech and gesture--have historically been designed so that they either request confwmationof speech, their primary modality, or not at all. Instead, we experimented with delaying confirmation until after the speech and gesture were combined into a complete multimodal command. In controlled experiments, subjects achieved more commands per minute at a lower error rate when the system delayed confirmation, than compared to when subjects confirmed only speech. In addition, this style of late confirmation meets the user's expectation that confirmedcommands shouldbe executable.
David McGee, Philip R. Cohen, Sharon L. Oviatt