We present a novel video summarization and skimming technique using face detection on broadcast video programs. We take the faces in video as our primary target as they constitute the focus of most consumer video programs. We detect face tracks in video and define face-scene fragments based on start and end of face tracks. We define a fastforward skimming method using frames selected from fragments, thus covering all the faces and their interactions in the video program. We also define novel constraints for a smooth and visually representative summary, and construct longer but smoother summaries.
Kadir A. Peker, Isao Otsuka, Ajay Divakaran