We investigate the problem of automatically labelling
faces of characters in TV or movie material with their
names, using only weak supervision from automaticallyaligned
subtitl...
—As it is true for human perception that we gather information from different sources in natural and multi-modality forms, learning from multi-modalities has become an effective ...
We present a framework for analyzing the structure of digital media streams. Though our methods work for video, text, and audio, we concentrate on detecting the structure of digit...
This paper proposes a method to automatically extract highlight scenes from sports (baseball) live video in real time and to allow users to retrieve them. For this purpose, sophis...
Movies segmentation into semantically correlated units is a quite tedious task due to ”semantic gap”. Low-level features do not provide useful information about the semantical...