Sciweavers

ICASSP
2008
IEEE

Caption-aided speech detection in videos

14 years 5 months ago
Caption-aided speech detection in videos
This paper presents a novel audio-visual fusion method for speech detection, which is an important front-end for content-based video processing. This approach aims to extract homogeneous speech segments from the accompanying audio stream in real-world movie/TV videos with the help of video captions. Note that captions are mainly created to help viewers to follow the dialog, rather than to accurately locate the speech regions. We propose a caption-aided speech detection approach, which makes use of both caption information and audio information. The inaccurate positions of the captions are refined through using audio features (pitch and MFCCs) and BIC-based acoustic change detection. Comparison experiments against several other traditional speech detection approaches are conducted, showing that the proposed approach improves the speech detection performance greatly.
Cong Li, Zhijian Ou, Wei Hu, Tao Wang, Yimin Zhang
Added 30 May 2010
Updated 30 May 2010
Type Conference
Year 2008
Where ICASSP
Authors Cong Li, Zhijian Ou, Wei Hu, Tao Wang, Yimin Zhang
Comments (0)