Evidence from recent psycholinguistic experiments suggests that humans resolve reference incrementally in the presence of constraining visual context. In this paper, we present and evaluate a computational model of human reference resolution that directly builds a semantic interpretation of an utterance without the need for a separate syntactic analysis phase, which typically involves the construction of parse trees. The model is implemented on a robot using real audio and video inputs, thus operates in real-time, and is distributed over several computers, which run in parallel. Results from experiments with the model confirm the viability of the algorithm to process semantic interpretations, in particular, reference incrementally, as demonstrated to be employed by humans. 1
Matthias Scheutz, Kathleen M. Eberhard, Virgil And