For large scale automatic semantic video characterization, it is necessary to learn and model a large number of semantic concepts. But a major obstacle to this is the insufficiency of labeled training samples. Multi-view semi-supervised learning algorithms such as co-training may help by incorporating a large amount of unlabeled data. However, one of their assumptions requiring that each view be sufficient for learning is usually violated in semantic concept detection. In this paper, we propose a novel multi-view semisupervised learning algorithm called semi-supervised cross feature learning(SCFL). The proposed algorithm has two advantages over co-training. First, SCFL can theoretically guarantee its performance not being significantly degraded even when the assumption of view sufficiency fails. Also, SCFL can also handle additional views of unlabeled data even when these views are absent from the training data. As demonstrated in the TRECVID'03 semantic concept extraction task, ...
Rong Yan, Milind R. Naphade