

Movie/Script: Alignment and Parsing of Video and Text Transcription

15 years 4 months ago
Movie/Script: Alignment and Parsing of Video and Text Transcription
Abstract. Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales "in the wild". Harvesting automatically labeled sequences of actions from video would enable creation of large-scale and highlyvaried datasets. To enable such collection, we focus on the task of recovering scene structure in movies and TV series for object tracking and action retrieval. We present a weakly supervised algorithm that uses the screenplay and closed captions to parse a movie into a hierarchy of shots and scenes. Scene boundaries in the movie are aligned with screenplay scene labels and shots are reordered into a sequence of long continuous tracks or threads which allow for more accurate tracking of people, actions and objects. Scene segmentation, alignment, and shot threading are formulated as inference in a unified generative model and a novel hierarchical dynamic programming algorithm that can handle alignment and jump-limited reorderings in linear tim...
Timothee Cour, Chris Jordan, Eleni Miltsakaki, Ben
Added 15 Oct 2009
Updated 15 Oct 2009
Type Conference
Year 2008
Where ECCV
Authors Timothee Cour, Chris Jordan, Eleni Miltsakaki, Ben Taskar
Comments (0)