Abstract. This paper presents an efficient learning scheme for automatic annotation of video shot size. Instead of existing methods that applied in sports videos using domain knowledge, we are aiming at a general approach to deal with more video genres, by using a more general low- and mid- level feature set. Support Vector Machine (SVM) is adopted in the classification task, and an efficient co-training scheme is used to explore the information embedded in unlabeled data based on two complementary feature sets. Moreover, the subjectivity-consistent costs for different mis-classifications are introduced to make the final decisions by a cost minimization criterion. Experimental results indicate the effectiveness and efficiency of the proposed scheme for shot size annotation.