Labeling video data is an essential prerequisite for many vision applications that depend on training data, such as visual information retrieval, object recognition, and human activity modeling. However, manually creating labels is not only time-consuming but also subject to human errors, and eventually, becomes impossible for a very large amount of data (e.g. 24/7 surveillance video). To minimize the human effort in labeling, we propose a unified multi-class active learning approach for automatically labeling video data. The contributions of this paper include extending active learning from binary classes to multiple classes and evaluating several practical sample selection strategies. The experimental results show that the proposed approach works effectively even with a significantly reduced amount of labeled data. The best sample selection strategy can achieve more than a 50% error reduction over random sample selection.
Rong Yan, Jie Yang, Alexander G. Hauptmann