This paper presents a novel audio-visual fusion method for speech detection, which is an important front-end for content-based video processing. This approach aims to extract homo...
Cong Li, Zhijian Ou, Wei Hu, Tao Wang, Yimin Zhang
Annotation of large multilingual corpora remains a challenge to the data-driven approach to speech research, especially for under-resourced languages. This paper presents crosslan...
Extraction of bilingual audio and text data is crucial for designing Speech to Speech (S2S) systems. In this work, we propose an automatic method to segment multilingual audio str...
Andreas Tsiartas, Prasanta Kumar Ghosh, Panayiotis...
We define the task of incremental or 0lag utterance segmentation, that is, the task of segmenting an ongoing speech recognition stream into utterance units, and present first resu...
In conversational speech, irregularities in the speech such as overlaps and disruptions make it difficult to decide what is a sentence. Thus, despite very precise guidelines on how...