Caption-aided speech detection in videos

14 years 9 months ago

Download www.ee.tsinghua.edu.cn

This paper presents a novel audio-visual fusion method for speech detection, which is an important front-end for content-based video processing. This approach aims to extract homogeneous speech segments from the accompanying audio stream in real-world movie/TV videos with the help of video captions. Note that captions are mainly created to help viewers to follow the dialog, rather than to accurately locate the speech regions. We propose a caption-aided speech detection approach, which makes use of both caption information and audio information. The inaccurate positions of the captions are refined through using audio features (pitch and MFCCs) and BIC-based acoustic change detection. Comparison experiments against several other traditional speech detection approaches are conducted, showing that the proposed approach improves the speech detection performance greatly.

Cong Li, Zhijian Ou, Wei Hu, Tao Wang, Yimin Zhang

Real-time Traffic

ICASSP 2008 | Signal Processing | Speech Detection | Speech Detection Approach | Speech Detection Approaches |

claim paper

Post Info
More Details (n/a)

Added	30 May 2010
Updated	30 May 2010
Type	Conference
Year	2008
Where	ICASSP
Authors	Cong Li, Zhijian Ou, Wei Hu, Tao Wang, Yimin Zhang

Comments (0)

Sciweavers

Caption-aided speech detection in videos

ICASSP 2008 | Signal Processing | Speech Detection | Speech Detection Approach | Speech Detection Approaches |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers