The aim of this work is to devise an effective method for static summarization of home video sequences. Based on the premise that the user watching a summary is interested in people related (how many, who, emotional state) or activity related aspects, we formulate a novel approach to video summarization that works to specifically expose relevant video frames that make the content spotting tasks possible. Unlike existing approaches, which work on lowlevel features which often produce the summary not appealing to the viewer due to the semantic gap between low-level features and high-level concepts, our approach is driven by various utility functions (identity count, identity recognition, emotion recognition, activity recognition, sense of space) that use the results of face detection, face clustering, shot clustering and within-cluster frame alignment. The summarization problem is then treated as the problem of extracting the set of keyframes that have the maximum combined utility.