In most approaches to speech recognition, the speech signals are segmented using constant-time segmentation, for example into 25 ms blocks. Constant segmentation risks losing info...
This paper presents a novel audio-visual fusion method for speech detection, which is an important front-end for content-based video processing. This approach aims to extract homo...
Cong Li, Zhijian Ou, Wei Hu, Tao Wang, Yimin Zhang
Automatic lipreading is automatic speech recognition that uses only visual information. The relevant data in a video signal is isolated and features are extracted from it. From a s...
This paper proposes a technique for improving tone correctness in Thai speech synthesis based on an average voice model trained with nonprofessional speech corpus. The proposed te...
We propose a new type of audio feature (HFCC-ENS) as well as an unsupervised method for detecting short sequences of spoken words (key-phrases) within long speech recordings. Our ...