The paper presents an approach to using structural descriptions, obtained through a human-robot tutoring dialogue, as labels for the visual object models a robot learns. The paper shows how structural descriptions enable relating models for different aspects of one and the same object, and how being able to relate descriptions for visual models and discourse referents enables incremental updating of model descriptions through dialogue (either robot- or human-initiated). The approach has been implemented in an integrated architecture for human-assisted robot visual learning. Categories and Subject Descriptors I.2.7 [AI]: Natural language interfaces; I.2.10 [AI]: Vision and Scene Understanding General Terms Algorithms Keywords Cognitive vision and learning; natural language dialogue
Geert-Jan M. Kruijff, John D. Kelleher, Gregor Ber