Sciweavers

ICMI
2003
Springer

A visually grounded natural language interface for reference to spatial scenes

14 years 5 months ago
A visually grounded natural language interface for reference to spatial scenes
Many user interfaces, from graphic design programs to navigation aids in cars, share a virtual space with the user. Such applications are often ideal candidates for speech interfaces that allow the user to refer to objects in the shared space. We present an analysis of how people describe objects in spatial scenes using natural language. Based on this study, we describe a system that uses synthetic vision to “see” such scenes from the person’s point of view, and that understands complex natural language descriptions referring to objects in the scenes. This system is based on a rich notion of semantic compositionality embedded in a grounded language understanding framework. We describe its semantic elements, their compositional behaviour, and their grounding through the synthetic vision system. To conclude, we evaluate the performance of the system on unconstrained input. Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natural Language Processing—Language p...
Peter Gorniak, Deb Roy
Added 07 Jul 2010
Updated 07 Jul 2010
Type Conference
Year 2003
Where ICMI
Authors Peter Gorniak, Deb Roy
Comments (0)