This paper contributes to the automatic indexing of concert video. In contrast to traditional methods, which rely primarily on audio information for summarization applications, we explore how a visual-only concept detection approach could be employed. We investigate how our recent method for news video indexing – which takes into account the role of content and style – generalizes to the concert domain. We analyze video on three levels of visual abstraction, namely: content, style, and their fusion. Experiments with 12 concept detectors, on 45 hours of visually challenging concert video, show that the automatically learned best approach is concept-dependent. Moreover, these results suggest that the visual modality provides ample opportunity for more effective indexing and retrieval of concert video when used in addition to the auditory modality.
Cees G. M. Snoek, Marcel Worring, Arnold W. M. Sme