In the domain of candidly-captured student presentation videos, we examine and evaluate approaches for multimodal analysis and indexing of audio and video. We apply visual segmentation techniques on unedited video to determine likely changes of topics. Speaker segmentation methods are employed to determine individual student appearances, which are linked to extracted headshots to create a visual speaker index. Videos are augmented with time-aligned filtered keywords and phrases from highly inaccurate speech transcripts. An experimental user interface (UI) combines streaming videos, visual, and textual indices for browsing and searching. We evaluate the UI and methods in a large engineering design course. We report on observations and statistics collected over 4 semesters and 598 student participants. Results suggest that our video indexing and retrieval approach is effective, and that our continuous improvements are reflected in an increase in accuracy and completion rates of user stu...
Alexander Haubold, John R. Kender