We present a novel genre-independent SVM framework for detecting scene changes in broadcast video. Our framework works on content from a diverse range of genres by allowing sets of features, extracted from both audio and video streams, to be combined and compared automatically without the use of explicit thresholds. For ground truth, we use hand-labeled video scene boundaries from a wide variety of broadcast genres to generate positive and negative samples for the SVM. Our experiments include high-and low-level audio features such as semantic histograms and distances between Gaussian models, as well as video features such as shot cut positions. We evaluate the importance of these measures in a structured framework, with performance comparisons oriented via ROC curves. We achieve over 70% detection rate for 10% false positive rate on our corpus of over 7.5 hours of data collected from news, talk shows, sitcoms, dramas, music videos, and how-to-shows. IEEE International Conference on Mu...
Naveen Goela, Kevin W. Wilson, Feng Niu, Ajay Diva