In the domain of candidly-captured student presentation videos, we examine and evaluate approaches for multimodal analysis and indexing of audio and video. We apply visual segment...
The automatic transcription of broadcast news and meetings involves the segmentation, identification and tracking of speaker turns during each session, which is known as speaker di...
This paper describes an approach to the detection of stress in spoken New Zealand English. After identifying the vowel segments of the speech signal, the approach extracts two dif...
Huayang Xie, Peter Andreae, Mengjie Zhang, Paul Wa...
The effect of additive noise in a speaker recognition system is well known to be a crucial problem in real life applications. In a speaker recognition system, if the test utteranc...
This work addresses the soundtrack indexing of multimedia documents. We present and merge two audio classification tools that we have developed. The first one, a speech music clas...