Learning words from sights and sounds: a computational model

14 years 8 days ago

Download web.media.mit.edu

This paper presents an implemented computational model of word acquisition which learns directly from raw multimodal sensory input. Set in an information theoretic framework, the model acquires a lexicon by finding and statistically modeling consistent cross-modal structure. The model has been implemented in a system using novel speech processing, computer vision, and machine learning algorithms. In evaluations the model successfully performed speech segmentation, word discovery and visual categorization from spontaneous infant-directed speech paired with video images of single objects. These results demonstrate the possibility of using state-of-the-art techniques from sensory pattern recognition and machine learning to implement cognitive models which can process raw sensor data without the need for human transcription or labeling.

Deb Roy, Alex Pentland

Real-time Traffic

COGSCI 2002 | Implemented Computational Model | Raw Multimodal Sensory | Spontaneous Infant-directed Speech |

claim paper

Post Info
More Details (n/a)

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2002
Where	COGSCI
Authors	Deb Roy, Alex Pentland

Comments (0)

Sciweavers

Learning words from sights and sounds: a computational model

COGSCI 2002 | Implemented Computational Model | Raw Multimodal Sensory | Spontaneous Infant-directed Speech |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers