We recently proposed a method to find cluster structure in home videos based on statistical models of visual and temporal features of video segments and sequential binary Bayesian classification. In this paper, we present analysis and improved results on two key issues: feature selection and performance evaluation, using a ten-hour database (30 video clips, 1,075,000 frames). From multiple features and similarity measures, visual features are selected in order to minimize the empirical probability of misclassification. Temporal features are chosen to reflect the patterns existing in both shot and cluster duration and adjacency. Finally, we describe a detailed performance evaluation procedure that includes cluster detection, individual shot-cluster labeling, and prior selection.
Daniel Gatica-Perez, Alexander C. Loui, Ming-Ting