Recent content-based video retrieval systems combine output of concept detectors (also known as high-level features) with text obtained through automatic speech recognition. This paper concerns the problem of search using the noisy concept detector output only. Unlike term occurrence in text documents, the event of the occurrence of an audiovisual concept is only indirectly observable. We develop a probabilistic ranking framework for unobservable binary events to search in videos, called PR-FUBE. The framework explicitly models the probability of relevance of a video shot through the presence and absence of concepts. From our framework, we derive a ranking formula and show its relationship to previously proposed formulas. We evaluate our framework against two other retrieval approaches using the TRECVID 2005 and 2007 datasets. Especially using large numbers of concepts in retrieval results in good performance. We attribute the observed robustness against the noise introduced by less r...
Robin Aly, Djoerd Hiemstra, Arjen P. de Vries, Fra